JuiceFS: a smart distributed (and free) filesystem for the cloud era

Working with large files across distributed teams is harder than it sounds. You start with rclone, or a VPN pointing at a NAS, or some ad-hoc sync script: and it works, until it doesn’t. Too much data transferred, too much latency, too many “who has the file open?” problems.

JuiceFS is a different kind of answer. Not a sync tool, not a gateway, but a genuine POSIX-compliant distributed filesystem built on top of object storage. And once you understand its architecture, a lot of things click into place.

What JuiceFS actually is

JuiceFS is an open source, cloud-native distributed filesystem licensed under Apache 2.0. The core idea is simple but powerful: separate data from metadata.

Data is stored as chunks in any S3-compatible object storage (OCI Object Storage, AWS S3, Wasabi, Backblaze B2, MinIO, and 40+ others).
Metadata (file names, sizes, permissions, directory tree, timestamps) lives in a dedicated database engine.

This separation is the architectural decision that makes everything else possible: performance, consistency, collaborative access, and disaster recovery.

The internal data model

When you write a file to JuiceFS, it goes through a three-level split:

Chunk: up to 64 MiB per chunk.
Slice: variable-length segments within each chunk, depending on write patterns.
Block: fixed 4 MiB units, stored as individual objects in object storage.

This means a 40 GB file becomes thousands of independent objects in your bucket. When you open that file and jump to minute 20, JuiceFS calculates exactly which blocks correspond to that byte offset and fetches only those. No full download, no buffering from the beginning.

This is where it fundamentally differs from rclone mount. Rclone’s behavior on random access depends on --vfs-cache-mode: in full mode it downloads the entire file, while in reads mode it can still trigger significant re-downloads on seek. rclone is a sync and transfer tool with a mount layer bolted on; JuiceFS is a filesystem first.

Why the metadata engine matters (and which one to choose)

The metadata engine is not a secondary concern; it’s the brain of the filesystem. Every ls, open, stat, or rename is a metadata operation. The database you choose determines your latency, consistency guarantees, and failure modes.

Engine	Use case	Notes
SQLite	Local testing only	No concurrency, no HA.
Redis	High-performance, single-region	Sub-millisecond ops; needs persistence configured (RDB + AOF).
MySQL / PostgreSQL	Standard enterprise deployments	Good balance of reliability and familiarity.
TiKV	Large-scale distributed	Horizontal scaling, strong consistency, no single point of failure.
CockroachDB	Multi-region / geo-distributed	Survives region failures; ideal for multi regions scenarios.

For a single-site deployment with performance as priority, Redis with proper AOF persistence is the standard choice. For multi-region or high-availability requirements, TiKV or CockroachDB remove the single point of failure entirely. CockroachDB, in particular, is compelling for cross-continental setups because it natively replicates metadata across regions with strong consistency, meaning both the Milan and New York office see the same filesystem state without any reconciliation logic.

Collaborative access and file locking

This is a point that often gets overlooked when people compare JuiceFS to simpler tools.

JuiceFS supports both BSD locks (flock) and POSIX record locks (fcntl), which means applications that rely on standard file locking semantics (video editors, databases, ERP systems) work correctly even when multiple clients are mounted simultaneously.

Strong consistency is guaranteed: any confirmed modification is immediately visible on all servers mounted with the same filesystem. This isn’t eventual consistency. When a user in Milan saves a file, the user in New York sees the updated version immediately on next access. This makes JuiceFS suitable for genuine collaborative workflows, not just read-heavy shared storage.

Access protocols: more than just FUSE

JuiceFS isn’t limited to a single access method. The same underlying filesystem can be exposed through multiple protocols simultaneously:

FUSE mount: The standard approach. Works on Linux and macOS, exposes the filesystem as a local path.
Samba (SMB): Mount JuiceFS via FUSE, then export via Samba with Active Directory integration. Use the --enable-xattr option during mount to enable extended attribute support for Windows clients.
NFS: Mount points can be directly used for NFS sharing. NFS tends to have better throughput than Samba in high-concurrency scenarios.
S3 Gateway: JuiceFS can expose itself as an S3-compatible endpoint, useful for applications that speak S3 natively.
WebDAV: HTTP-based access, useful for cross-platform clients.
Kubernetes CSI Driver: Native integration with ReadWriteMany persistent volumes.

A realistic enterprise architecture might look like this: one JuiceFS filesystem + CockroachDBon and a S3 Object Storage, exposed via NFS to Linux workstations, via Samba to Windows users, and via CSI to Kubernetes pods: all accessing the same data, consistently.

Enterprise use cases

Multi-site distributed teams and big files sync problem

The canonical scenario example involves teams in Milan and New York collaborating on the same large files in real time.

The architecture relies on object storage (OCI Object Storage or any S3-compatible provider) as the single source of truth, with a distributed metadata database such as CockroachDB spanning multiple regions. JuiceFS is mounted directly on application servers and exposed to end users via standard protocols like Samba (for Windows) or NFS (for Linux/Mac).

In this model, files do not need to be synchronized or replicated across sites. There are no background sync jobs, no version conflicts, and no ambiguity about which copy is the most up to date. Instead, all teams access the same global namespace backed by object storage.

Crucially, when a file is accessed, only the required portions of the data are fetched on demand rather than downloading the entire file. This significantly reduces latency and bandwidth usage, especially when working with large datasets, and enables efficient, seamless collaboration across geographically distributed teams.

AI/ML pipelines and Kubernetes

JuiceFS supports dynamic provisioning of Persistent Volumes with ReadWriteMany access mode. For AI training, multiple GPU nodes can read the same dataset simultaneously without copying it to local storage first.

Big data and Hadoop

JuiceFS ships a Hadoop Java SDK compatible with Hadoop 2.x and 3.x, making it a drop-in replacement for HDFS. This enables the classic storage/compute decoupling.

Personal and prosumer use

You don’t need a rack to benefit from this. A realistic personal setup:

Wasabi or Backblaze B2 as object storage.
SQLite or a small Redis instance as metadata engine.
Kodi or Jellyfin with JuiceFS installed into the same machine (Raspberry?). Your entire video library becomes accessible from any device with high seek performance, without NAS hardware to maintain.

Object Storage Costs (real numbers)

Storage cost matters a lot. Here is a simple comparison (Don’t forget to check egress/API calls costs):

Provider	Storage cost (approx)
AWS S3	~23 USD per TB/month
OCI Object Storage	~20 USD per TB/month
Wasabi / Backblaze B2	~6 USD per TB/month

Data protection: snapshots, backup, and recovery

This is where JuiceFS’s architecture pays dividends: and also where you need to think carefully.

Filesystem-level snapshots

The juicefs snapshot command creates a metadata-only copy of a directory. It’s essentially instant because it only copies metadata pointers. When the original or the snapshot is modified, JuiceFS uses copy-on-write at the block level.

Metadata backup

The metadata DB is the critical component. If you lose it without a backup, you lose the filesystem structure.

Automatic backup: Starting from v1.0.0, the client automatically backs up metadata to the object storage every hour.
Manual dump/load: The juicefs dump command exports metadata to JSON, allowing migration between different engines (e.g., from Redis to MySQL).

Bash

# Export metadata to JSON (Redis example)
juicefs dump redis://192.168.1.6:6379 meta-backup.json

# Restore into a different engine (Mysql example)
juicefs load mysql://user:pass@(host:3306)/juicefs meta-backup.json

Note: juicefs dump does not provide snapshot consistency unless you suspend writes before dumping.

Integrity check and garbage collection

juicefs fsck: checks filesystem consistency.
juicefs gc: identifies and deletes orphaned objects in object storage (blocks no longer referenced by metadata).

Deployment models

Model	Metadata	Storage	Best for
Local Test	SQLite	Wasabi / Backblaze B2	Dev & Eval
Small Team	Redis (+AOF)	S3 compatible	One region, <50 users
Enterprise	MySQL / Postgress	S3 compatible	Existing DB infra
Global/HA	Cockroach/TiKV	Multi-region S3	Global teams, HA

Comparing with enterprise alternatives

JuiceFS overlaps with solutions like Hammerspace, CTERA, and Panzura in enabling distributed file access over object storage, but it differs significantly in scope and approach. While enterprise platforms provide fully integrated solutions with global namespaces, advanced data management, built-in governance, and enterprise-grade support with SLAs, JuiceFS focuses on delivering a lightweight, cloud-native filesystem with maximum flexibility and cost efficiency. It requires more operational expertise, as teams are responsible for deployment, scaling, and maintenance, but in return it offers greater control and avoids vendor lock-in. As a result, JuiceFS is particularly well suited for organizations with strong Kubernetes and cloud operations capabilities, whereas enterprise solutions are often preferred in environments that prioritize turnkey deployment, support, and comprehensive data services.

Quick start on macOS

If you want test how JuiceFS works on macOS:

//Install macFUSE first

// Install it
brew install juicefs

juicefs format \
    --storage s3 \
    --bucket https://BUCKET_NAME.TENANCY_NAMESPACE.compat.objectstorage.eu-frankfurt-1.oraclecloud.com \
    --access-key "xxxxxxxxxxxxxxxxxxxx" \
    --secret-key "xxxxxxxxxxxxxxxxxxxx" \
    sqlite3:///Users/pippo/Documents/juiceFS/myjfs.db \
    oci-storage


sudo juicefs mount \
    --cache-dir /Users/pippo/Documents/juiceFS/cache \
    --cache-size 20480 \
    --download-limit 800 \
    --upload-limit 800 \
    -d \
    -o volname=OCI_Drive,allow_other,local,noappledouble \
    sqlite3:///Users/pippo/Documents/juiceFS/myjfs.db \
    /Users/pippo/OCI_Bucket

//If you like it, you can setup the automatic auto-mount on the startup as well

Final thoughts

JuiceFS is not a drop-in replacement for rclone: they solve different problems. rclone is excellent for sync and migration; JuiceFS is a filesystem that handles consistency, locking, and efficient access transparently.

If your organization works with large shared datasets or wants to stop paying for NAS hardware while keeping real filesystem semantics, JuiceFS deserves serious evaluation.

Have you deployed JuiceFS in production? I’m curious about your metadata engine choice and whether you’ve run it behind NFS or Samba. Drop a comment or reach out directly.