Edward Ned Harvey

2011-05-05 02:56:40 UTC

This is a summary of a much longer discussion "Dedup and L2ARC memory

requirements (again)"

Sorry even this summary is long. But the results vary enormously based on

individual usage, so any "rule of thumb" metric that has been bouncing

around on the internet is simply not sufficient. You need to go into this

level of detail to get an estimate that's worth the napkin or bathroom

tissue it's scribbled on.

This is how to (reasonably) accurately estimate the hypothetical ram

requirements to hold the complete data deduplication tables (DDT) and L2ARC

references in ram. Please note both the DDT and L2ARC references can be

evicted from memory according to system policy, whenever the system decides

some other data is more valuable to keep. So following this guide does not

guarantee that the whole DDT will remain in ARC or L2ARC. But it's a good

start.

I am using a solaris 11 express x86 test system for my example numbers

below.

----------- To calculate size of DDT -----------

Each entry in the DDT is a fixed size, which varies by platform. You can

find it with the command:

echo ::sizeof ddt_entry_t | mdb -k

This will return a hex value, that you probably want to convert to decimal.

On my test system, it is 0x178 which is 376 bytes

There is one DDT entry per non-dedup'd (unique) block in the zpool. Be

aware that you cannot reliably estimate #blocks by counting #files. You can

find the number of total blocks including dedup'd blocks in your pool with

this command:

zdb -bb poolname | grep 'bp count'

Note: This command will run a long time and is IO intensive. On my systems

where a scrub runs for 8-9 hours, this zdb command ran for about 90 minutes.

On my test system, the result is 44145049 (44.1M) total blocks.

To estimate the number of non-dedup'd (unique) blocks (assuming average size

of dedup'd blocks = average size of blocks in the whole pool), use:

zpool list

Find the dedup ratio. In my test system, it is 2.24x. Divide the total

blocks by the dedup ratio to find the number of non-dedup'd (unique) blocks.

In my test system:

44145049 total blocks / 2.24 dedup ratio = 19707611 (19.7M) approx

non-dedup'd (unique) blocks

Then multiply by the size of a DDT entry.

19707611 * 376 = 7410061796 bytes = 7G total DDT size

----------- To calculate size of ARC/L2ARC references -----------

Each reference to a L2ARC entry requires an entry in ARC (ram). This is

another fixed size, which varies by platform. You can find it with the

command:

echo ::sizeof arc_buf_hdr_t | mdb -k

On my test system, it is 0xb0 which is 176 bytes

We need to know the average block size in the pool, to estimate the number

of blocks that will fit into L2ARC. Find the amount of space ALLOC in the

pool:

zpool list

Divide by the number of non-dedup'd (unique) blocks in the pool, to find the

average block size. In my test system:

790G / 19707611 = 42K average block size

Remember: If your L2ARC were only caching average size blocks, then the

payload ratio of L2ARC vs ARC would be excellent. In my test system, every

42K L2ARC would require 176bytes ARC (a ratio of 244x). This would result

in a negligible ARC memory consumption. But since your DDT can be pushed

out of ARC into L2ARC, you get a really bad ratio of L2ARC vs ARC memory

consumption. In my test system every 376bytes DDT entry in L2ARC consumes

176bytes ARC (a ratio of 2.1x). Yes, it is approximately possible to have

the complete DDT present in ARC and L2ARC, thus consuming tons of ram.

Remember disk mfgrs use base-10. So my 32G SSD is only 30G base-2.

(32,000,000,000 / 1024/1024/1024)

So I have 30G L2ARC, and the first 7G may be consumed by DDT. This leaves

23G remaining to be used for average-sized blocks.

The ARC consumed to reference the DDT in L2ARC is 176/376 * DDT size. In my

test system this is 176/376 * 7G = 3.3G

Take the remaining size of your L2ARC, divide by average block size, to get

the number of average size blocks the L2ARC can hold. In my test system:

23G / 42K = 574220 average-size blocks in L2ARC

Multiply by the ARC size of a L2ARC reference. On my test system:

574220 * 176 = 101062753 bytes = 96MB ARC consumed to reference the

average-size blocks in L2ARC

So the total ARC consumption to hold L2ARC references in my test system is

3.3G + 96M ~= 3.4G

----------- To calculate total ram needed -----------

And finally - The max size the ARC is allowed to grow, is a constant that

varies by platform. On my system, it is 80% of system ram. You can find

this value using the command:

kstat -p zfs::arcstats:c_max

Divide by your total system memory to find the ratio.

Assuming the ratio is 4/5, it means you need to buy 5/4 the amount of

calculated ram to satisfy all your requirements.

So the end result is:

On my test system I guess the OS and processes consume 1G. (I'm making that

up without any reason.)

On my test system I guess I need 8G in the system to get reasonable

performance without dedup or L2ARC. (Again, I'm just making that up.)

We calculated that I need 7G for DDT and 3.4G for L2ARC. That is 10.4G.

Multiply by 5/4 and it means I need 13G

My system needs to be built with at least 8G + 13G = 21G.

Of this, 20% (4.2G) is more than enough to run the OS and processes, while

80% (16.8G) is available for ARC. Of the 16.8G ARC, the DDT and L2ARC

references will consume 10.4G, which leaves 6.4G for "normal" ARC caching.

These numbers are all fuzzy. Anything from 16G to 24G might be reasonable.

That's it. I'm done.

P.S. I'll just throw this out there: It is my personal opinion that you

probably won't have the whole DDT in ARC and L2ARC at the same time.

Because the L2ARC is populated from the soon-to-expire list of the ARC, it

seems unlikely that all the DDT entries will get into ARC, and then onto the

soon-to-expire list and then pulled back into ARC and stay there. The above

calculation is a sort of worst case. I think the following is likely to be

a more realistic actual case:

Personally, I would model the ARC memory consumption of the L2ARC entries

using the average block size of the data pool, and just neglect the DDT

entries in the L2ARC. Well ... inflate some. Say 10% of the DDT is in the

L2ARC and the ARC at the same time. I'm making up this number from thin

air.

My revised end result is:

On my test system I guess the OS and processes consume 1G. (I'm making that

up without any reason.)

On my test system I guess I need 8G in the system to get reasonable

performance without dedup or L2ARC. (Again, I'm just making that up.)

We calculated that I need 7G for DDT and (96M + 10% of 3.3G = 430M) for

L2ARC. Multiply by 5/4 and it means I need 7.5G * 1.25 = 9.4G

My system needs to be built with at least 8G + 9.4G = 17.4G.

Of this, 20% (3.5G) is more than enough to run the OS and processes, while

80% (13.9G) is available for ARC. Of the 13.9G ARC, the DDT and L2ARC

references will consume 7.5G, which leaves 6.4G for "normal" ARC caching.

I personally think that's likely to be more accurate in the observable

world.

My revised end result is still basically the same: These numbers are all

fuzzy. Anything from 16G to 24G might be reasonable.

requirements (again)"

Sorry even this summary is long. But the results vary enormously based on

individual usage, so any "rule of thumb" metric that has been bouncing

around on the internet is simply not sufficient. You need to go into this

level of detail to get an estimate that's worth the napkin or bathroom

tissue it's scribbled on.

This is how to (reasonably) accurately estimate the hypothetical ram

requirements to hold the complete data deduplication tables (DDT) and L2ARC

references in ram. Please note both the DDT and L2ARC references can be

evicted from memory according to system policy, whenever the system decides

some other data is more valuable to keep. So following this guide does not

guarantee that the whole DDT will remain in ARC or L2ARC. But it's a good

start.

I am using a solaris 11 express x86 test system for my example numbers

below.

----------- To calculate size of DDT -----------

Each entry in the DDT is a fixed size, which varies by platform. You can

find it with the command:

echo ::sizeof ddt_entry_t | mdb -k

This will return a hex value, that you probably want to convert to decimal.

On my test system, it is 0x178 which is 376 bytes

There is one DDT entry per non-dedup'd (unique) block in the zpool. Be

aware that you cannot reliably estimate #blocks by counting #files. You can

find the number of total blocks including dedup'd blocks in your pool with

this command:

zdb -bb poolname | grep 'bp count'

Note: This command will run a long time and is IO intensive. On my systems

where a scrub runs for 8-9 hours, this zdb command ran for about 90 minutes.

On my test system, the result is 44145049 (44.1M) total blocks.

To estimate the number of non-dedup'd (unique) blocks (assuming average size

of dedup'd blocks = average size of blocks in the whole pool), use:

zpool list

Find the dedup ratio. In my test system, it is 2.24x. Divide the total

blocks by the dedup ratio to find the number of non-dedup'd (unique) blocks.

In my test system:

44145049 total blocks / 2.24 dedup ratio = 19707611 (19.7M) approx

non-dedup'd (unique) blocks

Then multiply by the size of a DDT entry.

19707611 * 376 = 7410061796 bytes = 7G total DDT size

----------- To calculate size of ARC/L2ARC references -----------

Each reference to a L2ARC entry requires an entry in ARC (ram). This is

another fixed size, which varies by platform. You can find it with the

command:

echo ::sizeof arc_buf_hdr_t | mdb -k

On my test system, it is 0xb0 which is 176 bytes

We need to know the average block size in the pool, to estimate the number

of blocks that will fit into L2ARC. Find the amount of space ALLOC in the

pool:

zpool list

Divide by the number of non-dedup'd (unique) blocks in the pool, to find the

average block size. In my test system:

790G / 19707611 = 42K average block size

Remember: If your L2ARC were only caching average size blocks, then the

payload ratio of L2ARC vs ARC would be excellent. In my test system, every

42K L2ARC would require 176bytes ARC (a ratio of 244x). This would result

in a negligible ARC memory consumption. But since your DDT can be pushed

out of ARC into L2ARC, you get a really bad ratio of L2ARC vs ARC memory

consumption. In my test system every 376bytes DDT entry in L2ARC consumes

176bytes ARC (a ratio of 2.1x). Yes, it is approximately possible to have

the complete DDT present in ARC and L2ARC, thus consuming tons of ram.

Remember disk mfgrs use base-10. So my 32G SSD is only 30G base-2.

(32,000,000,000 / 1024/1024/1024)

So I have 30G L2ARC, and the first 7G may be consumed by DDT. This leaves

23G remaining to be used for average-sized blocks.

The ARC consumed to reference the DDT in L2ARC is 176/376 * DDT size. In my

test system this is 176/376 * 7G = 3.3G

Take the remaining size of your L2ARC, divide by average block size, to get

the number of average size blocks the L2ARC can hold. In my test system:

23G / 42K = 574220 average-size blocks in L2ARC

Multiply by the ARC size of a L2ARC reference. On my test system:

574220 * 176 = 101062753 bytes = 96MB ARC consumed to reference the

average-size blocks in L2ARC

So the total ARC consumption to hold L2ARC references in my test system is

3.3G + 96M ~= 3.4G

----------- To calculate total ram needed -----------

And finally - The max size the ARC is allowed to grow, is a constant that

varies by platform. On my system, it is 80% of system ram. You can find

this value using the command:

kstat -p zfs::arcstats:c_max

Divide by your total system memory to find the ratio.

Assuming the ratio is 4/5, it means you need to buy 5/4 the amount of

calculated ram to satisfy all your requirements.

So the end result is:

On my test system I guess the OS and processes consume 1G. (I'm making that

up without any reason.)

On my test system I guess I need 8G in the system to get reasonable

performance without dedup or L2ARC. (Again, I'm just making that up.)

We calculated that I need 7G for DDT and 3.4G for L2ARC. That is 10.4G.

Multiply by 5/4 and it means I need 13G

My system needs to be built with at least 8G + 13G = 21G.

Of this, 20% (4.2G) is more than enough to run the OS and processes, while

80% (16.8G) is available for ARC. Of the 16.8G ARC, the DDT and L2ARC

references will consume 10.4G, which leaves 6.4G for "normal" ARC caching.

These numbers are all fuzzy. Anything from 16G to 24G might be reasonable.

That's it. I'm done.

P.S. I'll just throw this out there: It is my personal opinion that you

probably won't have the whole DDT in ARC and L2ARC at the same time.

Because the L2ARC is populated from the soon-to-expire list of the ARC, it

seems unlikely that all the DDT entries will get into ARC, and then onto the

soon-to-expire list and then pulled back into ARC and stay there. The above

calculation is a sort of worst case. I think the following is likely to be

a more realistic actual case:

Personally, I would model the ARC memory consumption of the L2ARC entries

using the average block size of the data pool, and just neglect the DDT

entries in the L2ARC. Well ... inflate some. Say 10% of the DDT is in the

L2ARC and the ARC at the same time. I'm making up this number from thin

air.

My revised end result is:

On my test system I guess the OS and processes consume 1G. (I'm making that

up without any reason.)

On my test system I guess I need 8G in the system to get reasonable

performance without dedup or L2ARC. (Again, I'm just making that up.)

We calculated that I need 7G for DDT and (96M + 10% of 3.3G = 430M) for

L2ARC. Multiply by 5/4 and it means I need 7.5G * 1.25 = 9.4G

My system needs to be built with at least 8G + 9.4G = 17.4G.

Of this, 20% (3.5G) is more than enough to run the OS and processes, while

80% (13.9G) is available for ARC. Of the 13.9G ARC, the DDT and L2ARC

references will consume 7.5G, which leaves 6.4G for "normal" ARC caching.

I personally think that's likely to be more accurate in the observable

world.

My revised end result is still basically the same: These numbers are all

fuzzy. Anything from 16G to 24G might be reasonable.