Discussion:
L2ARC and poor read performance
Phil Harman
2011-06-07 16:12:52 UTC
Permalink
Ok here's the thing ...

A customer has some big tier 1 storage, and has presented 24 LUNs (from
four RAID6 groups) to an OI148 box which is acting as a kind of iSCSI/FC
bridge (using some of the cool features of ZFS along the way). The OI
box currently has 32GB configured for the ARC, and 4x 223GB SSDs for
L2ARC. It has a dual port QLogic HBA, and is currently configured to do
round-robin MPXIO over two 4Gbps links. The iSCSI traffic is over a dual
10Gbps card (rather like the one Sun used to sell).

I've just built a fresh pool, and have created 20x 100GB zvols which are
mapped to iSCSI clients. I have initialised the first 20GB of each zvol
with random data. I've had a lot of success with write performance (e.g.
in earlier tests I had 20 parallel streams writing 100GB each at over
600MB/sec aggregate), but read performance is very poor.

Right now I'm just playing with 20 parallel streams of reads from the
first 2GB of each zvol (i.e. 40GB in all). During each run, I see lots
of writes to the L2ARC, but less than a quarter the volume of reads. Yet
my FC LUNS are hot with 1000s of reads per second. This doesn't change
from run to run. Why?

Surely 20x 2GB of data (and it's associated metadata) will sit nicely in
4x 223GB SSDs?

Phil
Marty Scholes
2011-06-07 19:34:05 UTC
Permalink
I'll throw out some (possibly bad) ideas.

Is ARC satisfying the caching needs? 32 GB for ARC should almost cover the 40GB of total reads, suggesting that the L2ARC doesn't add any value for this test.

Are the SSD devices saturated from an I/O standpoint? Put another way, can ZFS put data to them fast enough? If they aren't taking writes fast enough, then maybe they can't effectively load for caching. Certainly if they are saturated for writes they can't do much for reads.

Are some of the reads sequential? Sequential reads don't go to L2ARC.

What does iostat say for the SSD units? What does arc_summary.pl (maybe spelled differently) say about the ARC / L2ARC usage? How much of the SSD units are in use as reported in zpool iostat -v?
--
This message posted from opensolaris.org
Phil Harman
2011-06-07 21:46:21 UTC
Permalink
Post by Marty Scholes
I'll throw out some (possibly bad) ideas.
Thanks for taking the time.
Post by Marty Scholes
Is ARC satisfying the caching needs? 32 GB for ARC should almost cover the 40GB of total reads, suggesting that the L2ARC doesn't add any value for this test.
Are the SSD devices saturated from an I/O standpoint? Put another way, can ZFS put data to them fast enough? If they aren't taking writes fast enough, then maybe they can't effectively load for caching. Certainly if they are saturated for writes they can't do much for reads.
The SSDs are barely ticking over, and can deliver almost as much
throughput as the current SAN storage.
Post by Marty Scholes
Are some of the reads sequential? Sequential reads don't go to L2ARC.
That'll be it. I assume the L2ARC is just taking metadata. In situations
such as mine, I would quite like the option of routing sequential read
data to the L2ARC also.

I do notice a benefit with a sequential update (i.e. COW for each
block), and I think this is because the L2ARC satisfies most of the
metadata reads instead of having to read them from the SAN.
Post by Marty Scholes
What does iostat say for the SSD units? What does arc_summary.pl (maybe spelled differently) say about the ARC / L2ARC usage? How much of the SSD units are in use as reported in zpool iostat -v?
LaoTsao
2011-06-07 21:57:31 UTC
Permalink
You have un balance setup
Fc 4gbps vs 10gbps nic
After 10b/8b encoding it is even worse, but this not yet impact your benchmark yet

Sent from my iPad
Hung-Sheng Tsao ( LaoTsao) Ph.D
Post by Phil Harman
Post by Marty Scholes
I'll throw out some (possibly bad) ideas.
Thanks for taking the time.
Post by Marty Scholes
Is ARC satisfying the caching needs? 32 GB for ARC should almost cover the 40GB of total reads, suggesting that the L2ARC doesn't add any value for this test.
Are the SSD devices saturated from an I/O standpoint? Put another way, can ZFS put data to them fast enough? If they aren't taking writes fast enough, then maybe they can't effectively load for caching. Certainly if they are saturated for writes they can't do much for reads.
The SSDs are barely ticking over, and can deliver almost as much throughput as the current SAN storage.
Post by Marty Scholes
Are some of the reads sequential? Sequential reads don't go to L2ARC.
That'll be it. I assume the L2ARC is just taking metadata. In situations such as mine, I would quite like the option of routing sequential read data to the L2ARC also.
I do notice a benefit with a sequential update (i.e. COW for each block), and I think this is because the L2ARC satisfies most of the metadata reads instead of having to read them from the SAN.
Post by Marty Scholes
What does iostat say for the SSD units? What does arc_summary.pl (maybe spelled differently) say about the ARC / L2ARC usage? How much of the SSD units are in use as reported in zpool iostat -v?
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Phil Harman
2011-06-07 22:02:27 UTC
Permalink
Post by LaoTsao
You have un balance setup
Fc 4gbps vs 10gbps nic
It's actually 2x 4Gbps (using MPXIO) vs 1x 10Gbps.
Post by LaoTsao
After 10b/8b encoding it is even worse, but this not yet impact your benchmark yet
Sent from my iPad
Hung-Sheng Tsao ( LaoTsao) Ph.D
Post by Phil Harman
Post by Marty Scholes
I'll throw out some (possibly bad) ideas.
Thanks for taking the time.
Post by Marty Scholes
Is ARC satisfying the caching needs? 32 GB for ARC should almost cover the 40GB of total reads, suggesting that the L2ARC doesn't add any value for this test.
Are the SSD devices saturated from an I/O standpoint? Put another way, can ZFS put data to them fast enough? If they aren't taking writes fast enough, then maybe they can't effectively load for caching. Certainly if they are saturated for writes they can't do much for reads.
The SSDs are barely ticking over, and can deliver almost as much throughput as the current SAN storage.
Post by Marty Scholes
Are some of the reads sequential? Sequential reads don't go to L2ARC.
That'll be it. I assume the L2ARC is just taking metadata. In situations such as mine, I would quite like the option of routing sequential read data to the L2ARC also.
I do notice a benefit with a sequential update (i.e. COW for each block), and I think this is because the L2ARC satisfies most of the metadata reads instead of having to read them from the SAN.
Post by Marty Scholes
What does iostat say for the SSD units? What does arc_summary.pl (maybe spelled differently) say about the ARC / L2ARC usage? How much of the SSD units are in use as reported in zpool iostat -v?
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Marty Scholes
2011-06-08 13:35:13 UTC
Permalink
Post by Marty Scholes
Post by Marty Scholes
Are some of the reads sequential? Sequential reads
don't go to L2ARC.
That'll be it. I assume the L2ARC is just taking
metadata. In situations
such as mine, I would quite like the option of
routing sequential read
data to the L2ARC also.
The good news is that it is almost a certaintly that actual iSCSI usage will be of a (more) random nature than your tests, suggesting higher L2ARC usage in real world application.

I'm not sure how zfs makes the distinction between a random and sequential read, but the more you think about it, not caching sequential requests makes sense.
--
This message posted from opensolaris.org
Phil Harman
2011-06-08 14:03:30 UTC
Permalink
Post by Marty Scholes
Post by Marty Scholes
Post by Marty Scholes
Are some of the reads sequential? Sequential reads
don't go to L2ARC.
That'll be it. I assume the L2ARC is just taking
metadata. In situations
such as mine, I would quite like the option of
routing sequential read
data to the L2ARC also.
The good news is that it is almost a certaintly that actual iSCSI usage will be of a (more) random nature than your tests, suggesting higher L2ARC usage in real world application.
I'm not sure how zfs makes the distinction between a random and sequential read, but the more you think about it, not caching sequential requests makes sense.
Yes, in most cases, but I can think of some counter examples ;)
Richard Elling
2011-06-08 15:43:35 UTC
Permalink
Post by Phil Harman
Ok here's the thing ...
A customer has some big tier 1 storage, and has presented 24 LUNs (from four RAID6 groups) to an OI148 box which is acting as a kind of iSCSI/FC bridge (using some of the cool features of ZFS along the way). The OI box currently has 32GB configured for the ARC, and 4x 223GB SSDs for L2ARC. It has a dual port QLogic HBA, and is currently configured to do round-robin MPXIO over two 4Gbps links. The iSCSI traffic is over a dual 10Gbps card (rather like the one Sun used to sell).
The ARC size is not big enough to hold the data for the L2ARC headers for the size
of the L2ARC.
Post by Phil Harman
I've just built a fresh pool, and have created 20x 100GB zvols which are mapped to iSCSI clients. I have initialised the first 20GB of each zvol with random data. I've had a lot of success with write performance (e.g. in earlier tests I had 20 parallel streams writing 100GB each at over 600MB/sec aggregate), but read performance is very poor.
Right now I'm just playing with 20 parallel streams of reads from the first 2GB of each zvol (i.e. 40GB in all). During each run, I see lots of writes to the L2ARC, but less than a quarter the volume of reads. Yet my FC LUNS are hot with 1000s of reads per second. This doesn't change from run to run. Why?
Writes to the L2ARC devices are throttled to 8 or 16 MB/sec. If the L2ARC fill cannot keep up,
the data is unceremoniously evicted.
Post by Phil Harman
Surely 20x 2GB of data (and it's associated metadata) will sit nicely in 4x 223GB SSDs?
I'll throw out some (possibly bad) ideas.
Is ARC satisfying the caching needs? 32 GB for ARC should almost cover the 40GB of total reads, suggesting that the L2ARC doesn't add any value for this test.
Are the SSD devices saturated from an I/O standpoint? Put another way, can ZFS put data to them fast enough? If they aren't taking writes fast enough, then maybe they can't effectively load for caching. Certainly if they are saturated for writes they can't do much for reads.
Are some of the reads sequential? Sequential reads don't go to L2ARC.
This is not a true statement. If the primarycache policy is set to the default, all data will
be cached in the ARC.
Post by Phil Harman
What does iostat say for the SSD units? What does arc_summary.pl (maybe spelled differently) say about the ARC / L2ARC usage? How much of the SSD units are in use as reported in zpool iostat -v?
The ARC statistics are nicely documented in arc.c and available as kstats.
-- richard
Marty Scholes
2011-06-08 18:44:16 UTC
Permalink
Post by Richard Elling
This is not a true statement. If the primarycache
policy is set to the default, all data will
be cached in the ARC.
Richard, you know this stuff so well that I am hesitant to disagree with you. At the same time, I have seen this myself, trying to load video files into L2ARC without success.
Post by Richard Elling
The ARC statistics are nicely documented in arc.c and
available as kstats.
And I looked in the source. My C is a little rusty, yet it appears that prefetch items are not stored in L2ARC by default. Prefetches will satisfy a good portion of sequential reads but won't go to L2ARC.
--
This message posted from opensolaris.org
Daniel Carosone
2011-06-09 01:16:57 UTC
Permalink
Post by Marty Scholes
And I looked in the source. My C is a little rusty, yet it appears
that prefetch items are not stored in L2ARC by default. Prefetches
will satisfy a good portion of sequential reads but won't go to
L2ARC.
Won't go to L2ARC while they're still speculative reads, maybe.
Once they're actually used by the app to satisfy a good portion of the
actual reads, they'll have hits stats and will.

I suspect the problem is the threshold for l2arc writes. Sequential
reads can be much faster than this rate, meaning it can take a lot of
effort/time to fill.

You could test by doing slow sequential reads, and see if the l2arc
fills any more for the same reads spread over a longer time.

--
Dan.

Loading...