zfs-discuss Digest, Vol 89, Issue 12
(too old to reply)
Kristoffer Sheather @ CloudCentral
2013-03-18 20:28:53 UTC
You could always use 40-gigabit between the two storage systems which would
speed things dramatically, or back to back 56-gigabit IB.

From: zfs-discuss-***@opensolaris.org
Sent: Monday, March 18, 2013 11:01 PM
To: zfs-***@opensolaris.org
Subject: zfs-discuss Digest, Vol 89, Issue 12

Send zfs-discuss mailing list submissions to

To subscribe or unsubscribe via the World Wide Web, visit
or, via email, send a message with subject or body 'help' to

You can reach the person managing the list at

When replying, please edit your Subject line so it is more specific
than "Re: Contents of zfs-discuss digest..."

Today's Topics:

1. Re: [zfs] Re: Petabyte pool? (Richard Yao)
2. Re: [zfs] Re: Petabyte pool? (Trey Palmer)


Message: 1
Date: Sat, 16 Mar 2013 08:23:07 -0400
From: Richard Yao <***@gentoo.org>
To: ***@lists.illumos.org
Cc: zfs-***@opensolaris.org
Subject: Re: [zfs-discuss] [zfs] Re: Petabyte pool?
Message-ID: <***@gentoo.org>
Content-Type: text/plain; charset="iso-8859-1"
So, has anyone done this? Or come close to it? Thoughts, even if you
haven't done it yourself?
Don't forget about backups :-)
-- richard
Transferring 1 PB over a 10 gigabit link will take at least 10 days when
overhead is taken into account. The backup system should have a
dedicated 10 gigabit link at the minimum and using incremental send/recv
will be extremely important.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 901 bytes
Desc: OpenPGP digital signature


Message: 2
Date: Sat, 16 Mar 2013 01:30:41 -0400 (EDT)
From: Trey Palmer <***@nerdmagic.com>
To: "***@lists.illumos.org" <***@lists.illumos.org>
Cc: "***@lists.illumos.org" <***@lists.illumos.org>,
"zfs-***@opensolaris.org" <zfs-***@opensolaris.org>
Subject: Re: [zfs-discuss] [zfs] Re: Petabyte pool?
Message-ID: <1CE7BF11-6E42-421E-B136-***@nerdmagic.com>
Content-Type: text/plain; charset=us-ascii

I know it's heresy these days, but given the I/O throughput you're looking
for and the amount you're going to spend on disks, a T5-2 could make sense
when they're released (I think) later this month.

Crucial sells RAM they guarantee for use in SPARC T-series, and since
you're at an edu the academic discount is 35%. So A T4-2 with 512GB RAM
could be had for under $35K shortly after release, 4-5 months before the E5
Xeon was released. It seemed a surprisingly good deal to me.

The T5-2 has 32x3.6GHz cores, 256 threads and ~150GB/s aggregate memory
bandwidth. In my testing a T4-1 can compete with a 12-core E-5 box on I/O
and memory bandwidth, and this thing is about 5 times bigger than the T4-1.
It should have at least 10 PCIe's and will take 32 DIMMs minimum, maybe
64. And is likely to cost you less than $50K with aftermarket RAM.

-- Trey
Using a Dell R720 head unit, plus a bunch of Dell MD1200 JBODs dual
to a couple of LSI SAS switches.
How many HBA's in the R720?
We have qty 2 LSI SAS 9201-16e HBA's (Dell resold[1]).
Sounds similar in approach to the Aberdeen product another sender
referred to,
Loading Image...
One concern I had is that I compared our SuperMicro JBOD with 40x 4TB
in it, connected via a dual-port LSI SAS 9200-8e HBA, to the same pool
on a 40-slot server with 40x SATA drives in it. But the server uses n
expanders, instead using SAS-to-SATA octopus cables to connect the
directly to three internal SAS HBA's (2x 9201-16i's, 1x 9211-8i).
What I found was that the internal pool was significantly faster for
sequential and random I/O than the pool on the external JBOD.
My conclusion was that I would not want to exceed ~48 drives on a single
8-port SAS HBA. So I thought that running the I/O of all your hundreds
of drives through only two HBA's would be a bottleneck.
LSI's specs say 4800MBytes/sec for an 8-port SAS HBA, but 4000MBytes/sec
for that card in an x8 PCIe-2.0 slot. Sure, the newer 9207-8e is rated
at 8000MBytes/sec in an x8 PCIe-3.0 slot, but it still has only the same
8 SAS ports going at 4800MBytes/sec.
Yes, I know the disks probably can't go that fast. But in my tests
above, the internal 40-disk pool measures 2000MBytes/sec sequential
reads and writes, while the external 40-disk JBOD measures at 1500
to 1700 MBytes/sec. Not a lot slower, but significantly slower, so
I do think the number of HBA's makes a difference.
At the moment, I'm leaning toward piling six, eight, or ten HBA's into
a server, preferably one with dual IOH's (thus two PCIe busses), and
connecting dual-path JBOD's in that manner.
I hadn't looked into SAS switches much, but they do look more reliable
than daisy-chaining a bunch of JBOD's together. I just haven't seen
how to get more bandwidth through them to a single host.
Archives: https://www.listbox.com/member/archive/182191/=now
Powered by Listbox: http://www.listbox.com

zfs-discuss mailing list

End of zfs-discuss Digest, Vol 89, Issue 12

Continue reading on narkive: