Skip to content

Commit

Permalink
Merge pull request #1390 from trapexit/tiered
Browse files Browse the repository at this point in the history
Add tiered cache details to docs
  • Loading branch information
trapexit authored Jan 8, 2025
2 parents 03dc17f + 7e96428 commit 5e584f2
Show file tree
Hide file tree
Showing 2 changed files with 89 additions and 0 deletions.
88 changes: 88 additions & 0 deletions mkdocs/docs/usage_patterns.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Usage Patterns

## tiered cache

Some storage technologies support what is called "tiered" caching. The
placing of smaller, faster storage as a transparent cache to larger,
slower storage. NVMe, SSD, Optane in front of traditional HDDs for
instance.

mergerfs does not natively support any sort of tiered caching. Most
users have no use for such a feature and its inclusion would
complicate the code as it exists today. However, there are a few
situations where a cache filesystem could help with a typical mergerfs
setup.

1. Fast network, slow filesystems, many readers: You've a 10+Gbps
network with many readers and your regular filesystems can't keep
up.
2. Fast network, slow filesystems, small'ish bursty writes: You have
a 10+Gbps network and wish to transfer amounts of data less than
your cache filesystem but wish to do so quickly and the time
between bursts is long enough to migrate data.

With #1 it's arguable if you should be using mergerfs at all. A RAID
level that can aggregate performance or using higher performance
storage would probably be the better solution. If you're going to use
mergerfs there are other tactics that may help: spreading the data
across filesystems (see the mergerfs.dup tool) and setting
`func.open=rand`, using `symlinkify`, or using dm-cache or a similar
technology to add tiered cache to the underlying device itself.

With #2 one could use dm-cache as well but there is another solution
which requires only mergerfs and a cronjob.

1. Create 2 mergerfs pools. One which includes just the slow branches
and one which has both the fast branches (SSD,NVME,etc.) and slow
branches. The 'base' pool and the 'cache' pool.
2. The 'cache' pool should have the cache branches listed first in
the branch list.
3. The best `create` policies to use for the 'cache' pool would
probably be `ff`, `epff`, `lfs`, `msplfs`, or `eplfs`. The latter
three under the assumption that the cache filesystem(s) are far
smaller than the backing filesystems. If using path preserving
policies remember that you'll need to manually create the core
directories of those paths you wish to be cached. Be sure the
permissions are in sync. Use `mergerfs.fsck` to check / correct
them. You could also set the slow filesystems mode to `NC` though
that'd mean if the cache filesystems fill you'd get "out of space"
errors.
4. Enable `moveonenospc` and set `minfreespace` appropriately. To
make sure there is enough room on the "slow" pool you might want
to set `minfreespace` to at least as large as the size of the
largest cache filesystem if not larger. This way in the worst case
the whole of the cache filesystem(s) can be moved to the other
drives.
5. Set your programs to use the 'cache' pool.
6. Save one of the below scripts or create you're own. The script's
responsibility is to move files from the cache filesystems (not
pool) to the 'base' pool.
7. Use `cron` (as root) to schedule the command at whatever frequency
is appropriate for your workflow.


### time based expiring

Move files from cache to base pool based only on the last time the
file was accessed. Replace `-atime` with `-amin` if you want minutes
rather than days. May want to use the `fadvise` / `--drop-cache`
version of rsync or run rsync with the tool
[nocache](https://github.com/Feh/nocache).

**NOTE:** The arguments to these scripts include the cache
**filesystem** itself. Not the pool with the cache filesystem. You
could have data loss if the source is the cache pool.

[mergerfs.time-based-mover](https://github.com/trapexit/mergerfs/blob/latest-release/tools/mergerfs.time-based-mover?raw=1)


### percentage full expiring

Move the oldest file from the cache to the backing pool. Continue till
below percentage threshold.

**NOTE:** The arguments to these scripts include the cache
**filesystem** itself. Not the pool with the cache filesystem. You
could have data loss if the source is the cache pool.

[mergerfs.percent-full-mover](https://github.com/trapexit/mergerfs/blob/latest-release/tools/mergerfs.percent-full-mover?raw=1)
1 change: 1 addition & 0 deletions mkdocs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ nav:
- performance.md
- benchmarking.md
- tooling.md
- usage_patterns.md
- FAQ:
- faq/reliability_and_scalability.md
- faq/usage_and_functionality.md
Expand Down

0 comments on commit 5e584f2

Please sign in to comment.