File Format Specification#
This document describes the barecat file format.
Overview#
A barecat archive consists of:
Shard files - Binary files containing concatenated file data
Index file - SQLite database with metadata
File Naming#
Given a base path myarchive.barecat:
myarchive.barecat # SQLite index database
myarchive.barecat-shard-00000 # First data shard
myarchive.barecat-shard-00001 # Second data shard (if needed)
myarchive.barecat-shard-00002 # ...
Shard numbers are zero-padded to 5 digits.
SQLite Index#
The index is a standard SQLite 3 database with the following schema:
Tables#
files - File metadata
CREATE TABLE files (
path TEXT NOT NULL,
parent TEXT GENERATED ALWAYS AS (
rtrim(rtrim(path, replace(path, '/', '')), '/')
) VIRTUAL NOT NULL REFERENCES dirs(path),
shard INTEGER NOT NULL,
offset INTEGER NOT NULL,
size INTEGER DEFAULT 0,
crc32c INTEGER DEFAULT NULL,
mode INTEGER DEFAULT NULL,
uid INTEGER DEFAULT NULL,
gid INTEGER DEFAULT NULL,
mtime_ns INTEGER DEFAULT NULL
);
path: Full path within the archive (e.g., “dir/subdir/file.txt”)parent: Computed parent directory pathshard: Shard number (0, 1, 2, …)offset: Byte offset within the shardsize: File size in bytescrc32c: CRC32C checksum of file contentsmode: Unix file mode (permissions)uid,gid: Owner user/group IDmtime_ns: Modification time in nanoseconds since Unix epoch
dirs - Directory metadata and statistics
CREATE TABLE dirs (
path TEXT NOT NULL,
parent TEXT GENERATED ALWAYS AS (
CASE WHEN path = '' THEN NULL
ELSE rtrim(rtrim(path, replace(path, '/', '')), '/') END
) VIRTUAL REFERENCES dirs(path),
num_subdirs INTEGER DEFAULT 0,
num_files INTEGER DEFAULT 0,
num_files_tree INTEGER DEFAULT 0,
size_tree INTEGER DEFAULT 0,
mode INTEGER DEFAULT NULL,
uid INTEGER DEFAULT NULL,
gid INTEGER DEFAULT NULL,
mtime_ns INTEGER DEFAULT NULL
);
num_subdirs: Immediate subdirectory countnum_files: Immediate file countnum_files_tree: Recursive file countsize_tree: Recursive total size
config - Archive configuration
CREATE TABLE config (
key TEXT PRIMARY KEY,
value_text TEXT DEFAULT NULL,
value_int INTEGER DEFAULT NULL
) WITHOUT ROWID;
Standard config entries:
use_triggers: 1 if triggers are activeshard_size_limit: Maximum shard size in bytesschema_version_major: Schema major version (currently 0)schema_version_minor: Schema minor version (currently 3)
Indexes#
CREATE UNIQUE INDEX idx_files_path ON files(path);
CREATE UNIQUE INDEX idx_dirs_path ON dirs(path);
CREATE INDEX idx_files_parent ON files(parent);
CREATE INDEX idx_dirs_parent ON dirs(parent);
CREATE INDEX idx_files_shard_offset ON files(shard, offset);
Triggers#
The database uses triggers to maintain directory statistics. When a file is added, the parent directory’s counters are automatically updated, propagating up the tree.
Triggers can be disabled for bulk operations via the use_triggers config
flag.
Checksum#
CRC32C (Castagnoli) is used for file checksums:
Polynomial: 0x1EDC6F41
Hardware accelerated on modern CPUs
Compatible with Google’s CRC32C implementation
Data Integrity#
Barecat provides:
File checksums - CRC32C for each file
SQLite integrity - ACID transactions, journaling
Verification -
barecat verifychecks all checksums
It does NOT provide:
Archive-level signatures
Encryption
Compression (files stored as-is)
Compatibility#
The format is designed for simplicity and long-term compatibility:
SQLite - Universally supported, stable format
Shards - Plain binary, no proprietary encoding
Schema versioning - Allows forward-compatible changes
Reading a barecat archive requires:
SQLite library
Ability to read binary files
Understanding of this specification
No special decompression or decryption is needed.
Writing a barecat archive is also straightforward with standard SQLite using the schema.sql file and regular file I/O.
Example: Reading Without Library#
Using standard tools:
# List all files
sqlite3 myarchive.barecat "SELECT path, shard, offset, size FROM files"
# Extract a specific file
sqlite3 myarchive.barecat \
"SELECT shard, offset, size FROM files WHERE path='dir/file.txt'"
# Returns: 0|1234|5678
# Read the data
dd if=myarchive.barecat-shard-00000 bs=1 skip=1234 count=5678
Using Python without barecat:
import sqlite3
conn = sqlite3.connect('myarchive.barecat')
cursor = conn.execute(
"SELECT shard, offset, size FROM files WHERE path=?",
('dir/file.txt',)
)
shard, offset, size = cursor.fetchone()
with open(f'myarchive.barecat-shard-{shard:05d}', 'rb') as f:
f.seek(offset)
data = f.read(size)
Version History#
Schema 0.3 (unreleased)
Fixed trigger bug:
num_filesno longer incorrectly propagated on directory move/delete (num_filescounts direct children only, not recursive)
Schema 0.2 (v0.2.5, January 2025)
First released schema version.
configtable with schema versioning (schema_version_major,schema_version_minor)crc32ccolumn in files for checksumsmode,uid,gid,mtime_nscolumns for Unix metadatadirstable withnum_subdirs,num_files,num_files_tree,size_treeSQLite triggers for automatic stats propagation
configtable uses WITHOUT ROWID
Internal development note: During development, an intermediate version briefly used
WITHOUT ROWID for all tables (files, dirs, config). This was reverted before release
because rowid tables are more space-efficient for this use case. The script
upgrade_database2.py exists to convert databases from this intermediate format,
but since it was never released, this script is unlikely to be needed.
Pre-versioned format (original, June 2023)
Never formally released. Incompatible with current barecat.
Simple schema without
configtable or versioningfiles: path, parent, shard, offset, sizedirectories: path, parent, total_size, total_file_countNo checksums or Unix metadata
Upgrading#
Run barecat upgrade <archive> to upgrade an archive to the current
schema version. The upgrade process detects the source version automatically.
Pre-versioned → 0.3
Heavy migration that:
Renames old index to
.oldCreates new index with current schema
Copies directory and file metadata
Calculates CRC32C checksums for all files (uses
--workers)
0.2 → 0.3
Lightweight in-place fix:
Drops buggy
del_subdirandmove_subdirtriggersRecreates triggers with fixed logic
Rebuilds directory tree statistics to fix any corruption
Updates schema version
This is fast even for large archives since it doesn’t touch file data.