How to Merge Archives#

This guide covers combining multiple archives into one.

Basic Merge#

Merge multiple barecat archives:

barecat merge -o combined.barecat archive1.barecat archive2.barecat archive3.barecat

Merge with shard size limit:

barecat merge -o combined.barecat -s 50G *.barecat

Merging Mixed Archive Types#

Barecat can merge barecat, tar, and zip archives in a single command:

barecat merge -o combined.barecat \
    existing.barecat \
    new_data.tar.gz \
    more_data.zip

This is useful when:

  • Adding new data delivered as tar/zip to an existing barecat

  • Consolidating data from multiple sources

Handling Duplicates#

By default, duplicate paths cause an error. Options:

Ignore Duplicates#

Keep the first occurrence, skip later ones:

barecat merge -o combined.barecat --ignore-duplicates archive1.barecat archive2.barecat

Append Mode#

Append to an existing archive (implies --ignore-duplicates):

# First merge creates the archive
barecat merge -o combined.barecat archive1.barecat

# Later, append more data
barecat merge -o combined.barecat -a archive2.barecat archive3.barecat

Force Overwrite#

Overwrite existing output archive:

barecat merge -o combined.barecat -f archive1.barecat archive2.barecat

Practical Examples#

Consolidating Daily Uploads#

# Initial archive
barecat create dataset.barecat /data/day1/

# Each day, append new data
barecat merge -o dataset.barecat -a /data/day2/
barecat merge -o dataset.barecat -a new_batch.tar.gz

Combining Training Splits#

barecat merge -o full_dataset.barecat \
    train.barecat \
    val.barecat \
    test.barecat \
    -s 50G

Python API#

from barecat import merge, merge_symlink

# Regular merge
merge(
    source_paths=['archive1.barecat', 'archive2.barecat', 'data.tar.gz'],
    target_path='combined.barecat',
    shard_size_limit=50 * 1024**3,
    ignore_duplicates=True,
)

# Symlink merge
merge_symlink(
    source_paths=['archive1.barecat', 'archive2.barecat'],
    target_path='combined.barecat',
)

Troubleshooting#

“File already exists” Error#

A file path exists in multiple input archives. Use --ignore-duplicates:

barecat merge -o out.barecat --ignore-duplicates *.barecat

See Also#