Summary: Dedup and L2ARC memory requirements

Discussion:

Edward Ned Harvey

2011-05-05 02:56:40 UTC

This is a summary of a much longer discussion "Dedup and L2ARC memory
requirements (again)"
Sorry even this summary is long. But the results vary enormously based on
individual usage, so any "rule of thumb" metric that has been bouncing
around on the internet is simply not sufficient. You need to go into this
level of detail to get an estimate that's worth the napkin or bathroom
tissue it's scribbled on.

This is how to (reasonably) accurately estimate the hypothetical ram
requirements to hold the complete data deduplication tables (DDT) and L2ARC
references in ram. Please note both the DDT and L2ARC references can be
evicted from memory according to system policy, whenever the system decides
some other data is more valuable to keep. So following this guide does not
guarantee that the whole DDT will remain in ARC or L2ARC. But it's a good
start.

I am using a solaris 11 express x86 test system for my example numbers
below.

----------- To calculate size of DDT -----------

Each entry in the DDT is a fixed size, which varies by platform. You can
find it with the command:
echo ::sizeof ddt_entry_t | mdb -k
This will return a hex value, that you probably want to convert to decimal.
On my test system, it is 0x178 which is 376 bytes

There is one DDT entry per non-dedup'd (unique) block in the zpool. Be
aware that you cannot reliably estimate #blocks by counting #files. You can
find the number of total blocks including dedup'd blocks in your pool with
this command:
zdb -bb poolname | grep 'bp count'
Note: This command will run a long time and is IO intensive. On my systems
where a scrub runs for 8-9 hours, this zdb command ran for about 90 minutes.
On my test system, the result is 44145049 (44.1M) total blocks.

To estimate the number of non-dedup'd (unique) blocks (assuming average size
of dedup'd blocks = average size of blocks in the whole pool), use:
zpool list
Find the dedup ratio. In my test system, it is 2.24x. Divide the total
blocks by the dedup ratio to find the number of non-dedup'd (unique) blocks.

In my test system:
44145049 total blocks / 2.24 dedup ratio = 19707611 (19.7M) approx
non-dedup'd (unique) blocks

Then multiply by the size of a DDT entry.
19707611 * 376 = 7410061796 bytes = 7G total DDT size

----------- To calculate size of ARC/L2ARC references -----------

Each reference to a L2ARC entry requires an entry in ARC (ram). This is
another fixed size, which varies by platform. You can find it with the
command:
echo ::sizeof arc_buf_hdr_t | mdb -k
On my test system, it is 0xb0 which is 176 bytes

We need to know the average block size in the pool, to estimate the number
of blocks that will fit into L2ARC. Find the amount of space ALLOC in the
pool:
zpool list
Divide by the number of non-dedup'd (unique) blocks in the pool, to find the
average block size. In my test system:
790G / 19707611 = 42K average block size

Remember: If your L2ARC were only caching average size blocks, then the
payload ratio of L2ARC vs ARC would be excellent. In my test system, every
42K L2ARC would require 176bytes ARC (a ratio of 244x). This would result
in a negligible ARC memory consumption. But since your DDT can be pushed
out of ARC into L2ARC, you get a really bad ratio of L2ARC vs ARC memory
consumption. In my test system every 376bytes DDT entry in L2ARC consumes
176bytes ARC (a ratio of 2.1x). Yes, it is approximately possible to have
the complete DDT present in ARC and L2ARC, thus consuming tons of ram.

Remember disk mfgrs use base-10. So my 32G SSD is only 30G base-2.
(32,000,000,000 / 1024/1024/1024)

So I have 30G L2ARC, and the first 7G may be consumed by DDT. This leaves
23G remaining to be used for average-sized blocks.
The ARC consumed to reference the DDT in L2ARC is 176/376 * DDT size. In my
test system this is 176/376 * 7G = 3.3G

Take the remaining size of your L2ARC, divide by average block size, to get
the number of average size blocks the L2ARC can hold. In my test system:
23G / 42K = 574220 average-size blocks in L2ARC
Multiply by the ARC size of a L2ARC reference. On my test system:
574220 * 176 = 101062753 bytes = 96MB ARC consumed to reference the
average-size blocks in L2ARC

So the total ARC consumption to hold L2ARC references in my test system is
3.3G + 96M ~= 3.4G

----------- To calculate total ram needed -----------

And finally - The max size the ARC is allowed to grow, is a constant that
varies by platform. On my system, it is 80% of system ram. You can find
this value using the command:
kstat -p zfs::arcstats:c_max
Divide by your total system memory to find the ratio.
Assuming the ratio is 4/5, it means you need to buy 5/4 the amount of
calculated ram to satisfy all your requirements.

So the end result is:
On my test system I guess the OS and processes consume 1G. (I'm making that
up without any reason.)
On my test system I guess I need 8G in the system to get reasonable
performance without dedup or L2ARC. (Again, I'm just making that up.)
We calculated that I need 7G for DDT and 3.4G for L2ARC. That is 10.4G.
Multiply by 5/4 and it means I need 13G
My system needs to be built with at least 8G + 13G = 21G.
Of this, 20% (4.2G) is more than enough to run the OS and processes, while
80% (16.8G) is available for ARC. Of the 16.8G ARC, the DDT and L2ARC
references will consume 10.4G, which leaves 6.4G for "normal" ARC caching.
These numbers are all fuzzy. Anything from 16G to 24G might be reasonable.

That's it. I'm done.

P.S. I'll just throw this out there: It is my personal opinion that you
probably won't have the whole DDT in ARC and L2ARC at the same time.
Because the L2ARC is populated from the soon-to-expire list of the ARC, it
seems unlikely that all the DDT entries will get into ARC, and then onto the
soon-to-expire list and then pulled back into ARC and stay there. The above
calculation is a sort of worst case. I think the following is likely to be
a more realistic actual case:

Personally, I would model the ARC memory consumption of the L2ARC entries
using the average block size of the data pool, and just neglect the DDT
entries in the L2ARC. Well ... inflate some. Say 10% of the DDT is in the
L2ARC and the ARC at the same time. I'm making up this number from thin
air.

My revised end result is:
On my test system I guess the OS and processes consume 1G. (I'm making that
up without any reason.)
On my test system I guess I need 8G in the system to get reasonable
performance without dedup or L2ARC. (Again, I'm just making that up.)
We calculated that I need 7G for DDT and (96M + 10% of 3.3G = 430M) for
L2ARC. Multiply by 5/4 and it means I need 7.5G * 1.25 = 9.4G
My system needs to be built with at least 8G + 9.4G = 17.4G.
Of this, 20% (3.5G) is more than enough to run the OS and processes, while
80% (13.9G) is available for ARC. Of the 13.9G ARC, the DDT and L2ARC
references will consume 7.5G, which leaves 6.4G for "normal" ARC caching.
I personally think that's likely to be more accurate in the observable
world.
My revised end result is still basically the same: These numbers are all
fuzzy. Anything from 16G to 24G might be reasonable.

Erik Trimble

2011-05-05 03:45:15 UTC