Python API Reference#

This is the reference documentation for barecat’s Python API.

Barecat Class#

class barecat.Barecat(path, readonly=True, overwrite=False, shard_size_limit=None, threadsafe=False, auto_codec=False)[source]#

Main class for reading and writing barecat archives.

Parameters:
  • path (str) – Path to the archive (without suffix).

  • readonly (bool) – Open in read-only mode. Default: True.

  • overwrite (bool) – Delete existing archive if it exists. Default: False.

  • shard_size_limit (int) – Maximum size per shard file in bytes. Default: unlimited.

  • threadsafe (bool) – Use thread-local storage for connections. Required for multi-process DataLoader. Default: False.

  • auto_codec (bool) – Deprecated. Use DecodedView instead. Will be removed in 1.0.

  • exist_ok (bool)

  • append_only (bool)

  • allow_writing_symlinked_shard (bool)

  • wal (bool)

  • readonly_is_immutable (bool)

Context Manager

with barecat.Barecat('archive.barecat') as bc:
    data = bc['file.txt']

Dictionary-like Access

__getitem__(path)[source]#

Get file contents by path.

Parameters:

path (str) – Path to the file.

Returns:

File contents as bytes.

Raises:

KeyError – If file does not exist.

Return type:

Union[bytes, Any]

__setitem__(path, data)[source]#

Set file contents by path.

Parameters:
  • path (str) – Path to the file.

  • data (bytes) – File contents.

  • content (Union[bytes, Any])

__delitem__(path)[source]#

Delete a file by path.

Parameters:

path (str) – Path to the file.

Raises:

KeyError – If file does not exist.

__contains__(path)[source]#

Check if a file with the given path exists in the archive.

Note: This only checks for files, not directories. Use exists() to check for both files and directories.

Parameters:

path (str) – Path to the file.

Returns:

True if file exists, False otherwise.

Return type:

bool

Filesystem-like Access

open(path, mode='r')[source]#

Open a file for reading or writing.

Parameters:
  • path (str) – Path to the file.

  • mode (str) – ‘r’ for text, ‘rb’ for binary, ‘r+b’ for read-write binary.

  • item (Union[BarecatFileInfo, str])

  • encoding (Optional[str])

  • errors (Optional[str])

  • newline (Optional[str])

Returns:

File-like object.

Return type:

Union[BarecatFileObject, TextIOWrapper]

with bc.open('file.txt') as f:
    data = f.read(100)
    f.seek(0)
listdir(path='')[source]#

List directory contents.

Parameters:

path (str) – Directory path.

Returns:

List of entry names (not full paths).

Return type:

list[str]

walk(top='')[source]#

Walk directory tree, like os.walk().

Parameters:
  • top (str) – Starting directory.

  • path (str)

Yields:

(dirpath, dirnames, filenames) tuples.

Return type:

Iterator[tuple[str, list[str], list[str]]]

glob(pattern, recursive=False)[source]#

Find files matching a glob pattern.

Parameters:
  • pattern (str) – Glob pattern (e.g., ‘.jpg’, ‘*/*.txt’).

  • recursive (bool) – Enable ** for recursive matching.

  • include_hidden (bool)

Returns:

List of matching paths.

Return type:

list[str]

isfile(path)[source]#

Check if path is a file.

Parameters:

path (str) – Path to check.

Return type:

bool

isdir(path)[source]#

Check if path is a directory.

Parameters:

path (str) – Path to check.

Return type:

bool

Adding Files

add(item, *, data=None, fileobj=None, dir_exist_ok=False, file_exist_ok=False)[source]#

Add a file or directory to the archive.

Parameters:
  • item (Union[BarecatEntryInfo, str]) – BarecatFileInfo, BarecatDirInfo, or path string.

  • data (bytes) – File contents (if not using fileobj).

  • fileobj – File-like object to read from.

  • dir_exist_ok (bool) – Don’t error if directory exists.

  • file_exist_ok (bool) – Skip if file exists (for merges).

  • bufsize (int)

add_by_path(filesystem_path, store_path=None, dir_exist_ok=False)[source]#

Add a file or directory from the filesystem. If the path points to a directory, the directory entry itself is added (not its contents recursively).

Parameters:
  • filesystem_path (str) – Path on the filesystem.

  • store_path (str) – Path in the archive (default: same as filesystem_path).

  • dir_exist_ok (bool) – Don’t error if directory already exists in the archive.

  • filesys_path (str)

Deletion

remove(path)[source]#

Remove a file from the archive.

Parameters:

Properties

index[source]#

The Index object managing the SQLite database.

Return type:

Index

shard_size_limit[source]#

Maximum shard size in bytes.

Return type:

int

BarecatFileInfo Class#

class barecat.BarecatFileInfo(path=None, mode=None, uid=None, gid=None, mtime_ns=None, shard=None, offset=None, size=None, crc32c=None)[source]#

Describes a file in the archive.

Parameters:
  • path (str) – File path within the archive.

  • mode (int) – Unix file mode (permissions).

  • uid (int) – Owner user ID.

  • gid (int) – Owner group ID.

  • mtime_ns (int) – Modification time in nanoseconds since epoch.

  • shard (int) – Shard number where file data is stored.

  • offset (int) – Byte offset within the shard.

  • size (int) – File size in bytes.

  • crc32c (int) – CRC32C checksum of contents.

path[source]#

File path (normalized on assignment).

size[source]#

File size in bytes.

mtime_dt[source]#

Modification time as datetime object.

Return type:

Optional[datetime]

asdict()[source]#

Return as dictionary.

Return type:

dict

BarecatDirInfo Class#

class barecat.BarecatDirInfo(path=None, mode=None, uid=None, gid=None, mtime_ns=None, num_subdirs=None, num_files=None, size_tree=None, num_files_tree=None)[source]#

Describes a directory in the archive.

Parameters:
  • path (str) – Directory path within the archive.

  • mode (int) – Unix directory mode.

  • uid (int) – Owner user ID.

  • gid (int) – Owner group ID.

  • mtime_ns (int) – Modification time in nanoseconds.

  • num_subdirs (int) – Number of immediate subdirectories.

  • num_files (int) – Number of immediate files.

  • size_tree (int) – Total size of all files recursively.

  • num_files_tree (int) – Total number of files recursively.

path[source]#

Directory path (normalized on assignment).

num_entries[source]#

Total entries (num_subdirs + num_files).

Return type:

int

Index Class#

class barecat.Index[source]#

Manages the SQLite database. Usually accessed via bc.index.

Parameters:
iter_all_fileinfos(order=Order.ANY)[source]#

Iterate over all files in the archive.

Parameters:
  • order (Order) – Ordering (ANY, PATH, ADDRESS, RANDOM).

  • bufsize (Optional[int])

Yields:

BarecatFileInfo objects.

Return type:

Iterator[barecat.core.types.BarecatFileInfo]

iter_all_dirinfos(order=Order.ANY)[source]#

Iterate over all directories.

Yields:

BarecatDirInfo objects.

Parameters:
  • order (barecat.core.types.Order)

  • bufsize (Optional[int])

Return type:

Iterator[barecat.core.types.BarecatDirInfo]

iter_all_filepaths(order=Order.ANY)[source]#

Iterate over all file paths (files only, not directories).

Parameters:
  • order (Order) – Ordering (ANY, PATH, ADDRESS, RANDOM).

  • bufsize (Optional[int])

Yields:

Path strings.

Return type:

Iterator[str]

iter_all_paths(order=Order.ANY)[source]#

Iterate over all entry paths (both files and directories).

Parameters:
  • order (Order) – Ordering (ANY, PATH, ADDRESS, RANDOM).

  • bufsize (Optional[int])

Yields:

Path strings.

Return type:

Iterator[str]

lookup_file(path)[source]#

Look up file info by path.

Parameters:
  • path (str) – File path.

  • normalized (bool)

Returns:

BarecatFileInfo

Raises:

FileNotFoundBarecatError – If not found.

Return type:

barecat.core.types.BarecatFileInfo

lookup_dir(path)[source]#

Look up directory info by path.

Parameters:
  • path (str) – Directory path.

  • dirpath (str)

Returns:

BarecatDirInfo

Raises:

NotADirectoryBarecatError – If not found.

Return type:

barecat.core.types.BarecatDirInfo

Order Enum#

class barecat.Order[source]#

Ordering options for iteration.

ANY[source]#

Default order (as returned by SQLite).

PATH[source]#

Alphabetical by path.

ADDRESS[source]#

By shard and offset (physical order).

RANDOM[source]#

Random order.

DESC[source]#

Descending (combine with PATH or ADDRESS).

from barecat import Order

# Iterate in physical order (optimal for sequential reads)
for f in bc.index.iter_all_fileinfos(order=Order.ADDRESS):
    ...

# Iterate in reverse alphabetical order
for f in bc.index.iter_all_fileinfos(order=Order.PATH | Order.DESC):
    ...

Exceptions#

exception barecat.exceptions.BarecatError[source]#

Base exception for barecat errors.

Parameters:

message (str)

exception barecat.exceptions.FileNotFoundBarecatError[source]#

File not found in archive.

Parameters:

path (str)

exception barecat.exceptions.FileExistsBarecatError[source]#

File already exists in archive.

Parameters:

path (str)

exception barecat.exceptions.IsADirectoryBarecatError[source]#

Operation expected file but got directory.

Parameters:

path (str)

exception barecat.exceptions.NotADirectoryBarecatError[source]#

Operation expected directory but got file.

Parameters:

message (str)

exception barecat.exceptions.DirectoryNotEmptyBarecatError[source]#

Cannot delete non-empty directory.

Parameters:

path (str)

DecodedView Class#

class barecat.DecodedView(store)[source]#

Dict-like view that automatically encodes/decodes based on file extension.

Wraps a raw bytes store (like Barecat) and automatically encodes on write and decodes on read based on the file extension. Raises an error if no codec is registered for the extension.

Parameters:

store (MutableMapping[str, bytes]) – A MutableMapping[str, bytes] to wrap (e.g., a Barecat instance).

Basic Usage

import barecat
from barecat import DecodedView

with barecat.Barecat('data.barecat', readonly=False) as bc:
    dec = DecodedView(bc)

    # JSON: dict/list ↔ bytes
    dec['config.json'] = {'key': 'value', 'count': 42}
    config = dec['config.json']  # Returns dict

    # Images: numpy array ↔ encoded bytes (via imageio)
    import numpy as np
    dec['image.png'] = np.zeros((100, 100, 3), dtype=np.uint8)
    image = dec['photo.jpg']  # Returns numpy array (H, W, C)

    # Numpy arrays
    dec['data.npy'] = np.array([1, 2, 3])
    arr = dec['data.npy']

    # Pickle: any Python object
    dec['model.pkl'] = {'weights': [...], 'config': {...}}

    # For raw bytes, use the store directly:
    bc['file.bin'] = b'raw binary data'

Stacked Compression

Compression codecs (.gz, .xz, .bz2) can be stacked with other codecs:

# JSON compressed with gzip
dec['config.json.gz'] = {'large': 'data'}
config = dec['config.json.gz']  # Decompresses, then parses JSON

# Pickle compressed with lzma
dec['model.pkl.xz'] = large_object

Supported Extensions

Extension

Type

Stackable

.json

dict/list

No

.pkl, .pickle

any (pickle)

No

.npy

numpy array

No

.npz

dict of numpy arrays

No

.msgpack

any (msgpack-numpy)

No

.jpg, .jpeg

numpy array (imageio)

No

.png

numpy array (imageio)

No

.gif, .bmp

numpy array (imageio)

No

.tiff, .tif

numpy array (imageio)

No

.webp, .exr

numpy array (imageio)

No

.gz, .gzip

gzip compression

Yes

.xz, .lzma

lzma compression

Yes

.bz2

bzip2 compression

Yes

Custom Codecs

register_codec(exts, encoder, decoder, nonfinal=False)[source]#

Register a custom codec for given extensions.

Parameters:
  • exts (list[str]) – List of extensions (e.g., ['.xyz']).

  • encoder (callable) – Function (data) -> bytes.

  • decoder (callable) – Function (bytes) -> data.

  • nonfinal (bool) – If True, codec can stack (like compression).

import yaml

dec.register_codec(
    ['.yaml', '.yml'],
    encoder=lambda d: yaml.dump(d).encode('utf-8'),
    decoder=lambda b: yaml.safe_load(b.decode('utf-8')),
)

dec['config.yaml'] = {'setting': 'value'}
clear_codecs()[source]#

Remove all registered codecs.

DecodedView wraps any MutableMapping[str, bytes].

Deprecated Functions#

barecat.open(path, mode='r', auto_codec=False, threadsafe_reader=True)[source]#

Deprecated since version Use: Barecat(path, readonly=True) or Barecat(path, readonly=False) directly.

Open a Barecat archive.

Parameters:
  • path (str) – Path to the archive (without suffix).

  • mode (str) – ‘r’ (read), ‘r+’ (read-write), ‘w+’ (overwrite), ‘a+’ (append), ‘x+’ (exclusive create).

Returns:

Barecat instance.

barecat.extract(barecat_path, target_directory)[source]#

Deprecated since version Use: barecat extract CLI or the Barecat class directly instead.

Extract all files from a barecat archive to a directory.

Parameters:
  • barecat_path (str) – Path to the archive.

  • target_directory (str) – Directory to extract to.

barecat.read_index(path)[source]#

Deprecated since version Use: the barecat.Index class directly instead.

Read the index of a barecat archive as a dictionary.

Parameters:

path (str) – Path to the archive.

Returns:

Dict mapping paths to (shard, offset, size) tuples.

Return type:

dict

barecat.write_index(dictionary, target_path)[source]#

Deprecated since version Use: the barecat.Index class directly instead.

Write a dictionary as a barecat index.

Parameters:
  • dictionary (dict) – Dict mapping paths to (shard, offset, size) tuples.

  • target_path (str) – Path for the output index.

Deprecated: auto_codec Parameter#

Deprecated since version 0.3.0: The auto_codec parameter is deprecated and will be removed in version 1.0. Use DecodedView instead.

Migration:

# Old (deprecated):
with barecat.Barecat('data.barecat', auto_codec=True) as bc:
    data = bc['file.json']

# New:
with barecat.Barecat('data.barecat') as bc:
    dec = DecodedView(bc)
    data = dec['file.json']