Python API Reference#
This is the reference documentation for barecat’s Python API.
Barecat Class#
- class barecat.Barecat(path, readonly=True, overwrite=False, shard_size_limit=None, threadsafe=False, auto_codec=False)[source]#
Main class for reading and writing barecat archives.
- Parameters:
path (str) – Path to the archive (without suffix).
readonly (bool) – Open in read-only mode. Default: True.
overwrite (bool) – Delete existing archive if it exists. Default: False.
shard_size_limit (int) – Maximum size per shard file in bytes. Default: unlimited.
threadsafe (bool) – Use thread-local storage for connections. Required for multi-process DataLoader. Default: False.
auto_codec (bool) – Deprecated. Use
DecodedViewinstead. Will be removed in 1.0.exist_ok (bool)
append_only (bool)
allow_writing_symlinked_shard (bool)
wal (bool)
readonly_is_immutable (bool)
Context Manager
with barecat.Barecat('archive.barecat') as bc: data = bc['file.txt']
Dictionary-like Access
- __contains__(path)[source]#
Check if a file with the given path exists in the archive.
Note: This only checks for files, not directories. Use
exists()to check for both files and directories.
Filesystem-like Access
- open(path, mode='r')[source]#
Open a file for reading or writing.
- Parameters:
- Returns:
File-like object.
- Return type:
Union[BarecatFileObject, TextIOWrapper]
with bc.open('file.txt') as f: data = f.read(100) f.seek(0)
Adding Files
- add(item, *, data=None, fileobj=None, dir_exist_ok=False, file_exist_ok=False)[source]#
Add a file or directory to the archive.
- Parameters:
item (Union[BarecatEntryInfo, str]) – BarecatFileInfo, BarecatDirInfo, or path string.
data (bytes) – File contents (if not using fileobj).
fileobj – File-like object to read from.
dir_exist_ok (bool) – Don’t error if directory exists.
file_exist_ok (bool) – Skip if file exists (for merges).
bufsize (int)
- add_by_path(filesystem_path, store_path=None, dir_exist_ok=False)[source]#
Add a file or directory from the filesystem. If the path points to a directory, the directory entry itself is added (not its contents recursively).
Deletion
- remove(path)[source]#
Remove a file from the archive.
- Parameters:
path (str) – Path to the file.
item (Union[BarecatFileInfo, str])
Properties
BarecatFileInfo Class#
- class barecat.BarecatFileInfo(path=None, mode=None, uid=None, gid=None, mtime_ns=None, shard=None, offset=None, size=None, crc32c=None)[source]#
Describes a file in the archive.
- Parameters:
path (str) – File path within the archive.
mode (int) – Unix file mode (permissions).
uid (int) – Owner user ID.
gid (int) – Owner group ID.
mtime_ns (int) – Modification time in nanoseconds since epoch.
shard (int) – Shard number where file data is stored.
offset (int) – Byte offset within the shard.
size (int) – File size in bytes.
crc32c (int) – CRC32C checksum of contents.
BarecatDirInfo Class#
- class barecat.BarecatDirInfo(path=None, mode=None, uid=None, gid=None, mtime_ns=None, num_subdirs=None, num_files=None, size_tree=None, num_files_tree=None)[source]#
Describes a directory in the archive.
- Parameters:
path (str) – Directory path within the archive.
mode (int) – Unix directory mode.
uid (int) – Owner user ID.
gid (int) – Owner group ID.
mtime_ns (int) – Modification time in nanoseconds.
num_subdirs (int) – Number of immediate subdirectories.
num_files (int) – Number of immediate files.
size_tree (int) – Total size of all files recursively.
num_files_tree (int) – Total number of files recursively.
Index Class#
- class barecat.Index[source]#
Manages the SQLite database. Usually accessed via
bc.index.- Parameters:
- iter_all_dirinfos(order=Order.ANY)[source]#
Iterate over all directories.
- Yields:
BarecatDirInfo objects.
- Parameters:
order (barecat.core.types.Order)
bufsize (Optional[int])
- Return type:
Iterator[barecat.core.types.BarecatDirInfo]
- iter_all_filepaths(order=Order.ANY)[source]#
Iterate over all file paths (files only, not directories).
- lookup_file(path)[source]#
Look up file info by path.
- Parameters:
- Returns:
BarecatFileInfo
- Raises:
FileNotFoundBarecatError – If not found.
- Return type:
barecat.core.types.BarecatFileInfo
- lookup_dir(path)[source]#
Look up directory info by path.
- Parameters:
- Returns:
BarecatDirInfo
- Raises:
NotADirectoryBarecatError – If not found.
- Return type:
barecat.core.types.BarecatDirInfo
Order Enum#
- class barecat.Order[source]#
Ordering options for iteration.
from barecat import Order # Iterate in physical order (optimal for sequential reads) for f in bc.index.iter_all_fileinfos(order=Order.ADDRESS): ... # Iterate in reverse alphabetical order for f in bc.index.iter_all_fileinfos(order=Order.PATH | Order.DESC): ...
Exceptions#
- exception barecat.exceptions.BarecatError[source]#
Base exception for barecat errors.
- Parameters:
message (str)
- exception barecat.exceptions.FileNotFoundBarecatError[source]#
File not found in archive.
- Parameters:
path (str)
- exception barecat.exceptions.FileExistsBarecatError[source]#
File already exists in archive.
- Parameters:
path (str)
- exception barecat.exceptions.IsADirectoryBarecatError[source]#
Operation expected file but got directory.
- Parameters:
path (str)
DecodedView Class#
- class barecat.DecodedView(store)[source]#
Dict-like view that automatically encodes/decodes based on file extension.
Wraps a raw bytes store (like
Barecat) and automatically encodes on write and decodes on read based on the file extension. Raises an error if no codec is registered for the extension.- Parameters:
store (MutableMapping[str, bytes]) – A
MutableMapping[str, bytes]to wrap (e.g., a Barecat instance).
Basic Usage
import barecat from barecat import DecodedView with barecat.Barecat('data.barecat', readonly=False) as bc: dec = DecodedView(bc) # JSON: dict/list ↔ bytes dec['config.json'] = {'key': 'value', 'count': 42} config = dec['config.json'] # Returns dict # Images: numpy array ↔ encoded bytes (via imageio) import numpy as np dec['image.png'] = np.zeros((100, 100, 3), dtype=np.uint8) image = dec['photo.jpg'] # Returns numpy array (H, W, C) # Numpy arrays dec['data.npy'] = np.array([1, 2, 3]) arr = dec['data.npy'] # Pickle: any Python object dec['model.pkl'] = {'weights': [...], 'config': {...}} # For raw bytes, use the store directly: bc['file.bin'] = b'raw binary data'
Stacked Compression
Compression codecs (
.gz,.xz,.bz2) can be stacked with other codecs:# JSON compressed with gzip dec['config.json.gz'] = {'large': 'data'} config = dec['config.json.gz'] # Decompresses, then parses JSON # Pickle compressed with lzma dec['model.pkl.xz'] = large_object
Supported Extensions
Extension
Type
Stackable
.jsondict/list
No
.pkl,.pickleany (pickle)
No
.npynumpy array
No
.npzdict of numpy arrays
No
.msgpackany (msgpack-numpy)
No
.jpg,.jpegnumpy array (imageio)
No
.pngnumpy array (imageio)
No
.gif,.bmpnumpy array (imageio)
No
.tiff,.tifnumpy array (imageio)
No
.webp,.exrnumpy array (imageio)
No
.gz,.gzipgzip compression
Yes
.xz,.lzmalzma compression
Yes
.bz2bzip2 compression
Yes
Custom Codecs
- register_codec(exts, encoder, decoder, nonfinal=False)[source]#
Register a custom codec for given extensions.
- Parameters:
import yaml dec.register_codec( ['.yaml', '.yml'], encoder=lambda d: yaml.dump(d).encode('utf-8'), decoder=lambda b: yaml.safe_load(b.decode('utf-8')), ) dec['config.yaml'] = {'setting': 'value'}
DecodedViewwraps anyMutableMapping[str, bytes].
Deprecated Functions#
- barecat.open(path, mode='r', auto_codec=False, threadsafe_reader=True)[source]#
Deprecated since version Use:
Barecat(path, readonly=True)orBarecat(path, readonly=False)directly.Open a Barecat archive.
- barecat.extract(barecat_path, target_directory)[source]#
Deprecated since version Use:
barecat extractCLI or theBarecatclass directly instead.Extract all files from a barecat archive to a directory.
Deprecated: auto_codec Parameter#
Deprecated since version 0.3.0: The auto_codec parameter is deprecated and will be removed in version 1.0.
Use DecodedView instead.
Migration:
# Old (deprecated):
with barecat.Barecat('data.barecat', auto_codec=True) as bc:
data = bc['file.json']
# New:
with barecat.Barecat('data.barecat') as bc:
dec = DecodedView(bc)
data = dec['file.json']