Discussion:
Identifying firmware version of SATA controller (LSI)
(too old to reply)
Ray Van Dolson
2010-02-05 20:26:44 UTC
Permalink
Raw Message
Trying to track down why our two Intel X-25E's are spewing out
Write/Retryable errors when being used as a ZIL (mirrored). The
system is running a LSI1068E controller with LSISASx36 expander
(box built by Silicon Mechanics).

The drives are fairly new, and it seems odd that both of the pair would
start showing errors at the same time....

I'm trying to figure out where I can find the firmware on the LSI
controller... are the bootup messages the only place I could expect to
see this? prtconf and prtdiag both don't appear to give firmware
information.

We have another nearly identical box that isn't showing these errors
which is why I want to compare firmware versions... the boot logs on
the "good" server have been rotated out so I can't find a Firmware
number for the mpt0 device in its logs to compare with.

Solaris 10 U8 x86.

Thanks,
Ray
Marion Hakanson
2010-02-05 20:55:06 UTC
Permalink
Raw Message
Post by Ray Van Dolson
I'm trying to figure out where I can find the firmware on the LSI
controller... are the bootup messages the only place I could expect to see
this? prtconf and prtdiag both don't appear to give firmware information.
. . .
Solaris 10 U8 x86.
The "raidctl" command is your friend; Useful for updating firmware
if you choose to do so, as well. You can also find the revisions in
the output of "prtconf -Dv", search for "firm" in the long list.

Regards,

Marion
Marion Hakanson
2010-03-25 18:51:25 UTC
Permalink
Raw Message
We have a Silicon Mechanics server with a SuperMicro X8DT3-F (Rev 1.02)
(onboard LSI 1068E (firmware 1.28.02.00) and a SuperMicro SAS-846EL1 (Rev
1.1) backplane.
. . .
The system is fully patched Solaris 10 U8, and the mpt driver is
Since you're running on Solaris-10 (and its mpt driver), have you tried
the firmware that Sun recommends for their own 1068E-based HBA's? There
are a couple of versions depending on your usage, but they're all earlier
revs than the 1.28.02.00 you have:

http://www.lsi.com/support/sun/sg_xpci8sas_e_sRoHS.html

Regards,

Marion
Ray Van Dolson
2010-03-25 18:55:29 UTC
Permalink
Raw Message
Post by Marion Hakanson
We have a Silicon Mechanics server with a SuperMicro X8DT3-F (Rev 1.02)
(onboard LSI 1068E (firmware 1.28.02.00) and a SuperMicro SAS-846EL1 (Rev
1.1) backplane.
. . .
The system is fully patched Solaris 10 U8, and the mpt driver is
Since you're running on Solaris-10 (and its mpt driver), have you tried
the firmware that Sun recommends for their own 1068E-based HBA's? There
are a couple of versions depending on your usage, but they're all earlier
http://www.lsi.com/support/sun/sg_xpci8sas_e_sRoHS.html
No, I haven't. Looks like something that would be worthwhile to try.

Thanks for the suggestion,

Ray
Ray Van Dolson
2010-04-02 02:08:23 UTC
Permalink
Raw Message
Post by Ray Van Dolson
Post by Marion Hakanson
We have a Silicon Mechanics server with a SuperMicro X8DT3-F (Rev 1.02)
(onboard LSI 1068E (firmware 1.28.02.00) and a SuperMicro SAS-846EL1 (Rev
1.1) backplane.
. . .
The system is fully patched Solaris 10 U8, and the mpt driver is
Since you're running on Solaris-10 (and its mpt driver), have you tried
the firmware that Sun recommends for their own 1068E-based HBA's? There
are a couple of versions depending on your usage, but they're all earlier
http://www.lsi.com/support/sun/sg_xpci8sas_e_sRoHS.html
No, I haven't. Looks like something that would be worthwhile to try.
Thanks for the suggestion,
Well, haven't yet been able to try the firmware suggestion, but we did
replace the backplane. No change.

I'm not sure the firmware change would do any good either. As it is
now, as long as the SSD drives are attached directly to the LSI
controller (no intermediary backplane), everything works fine -- no
errors.

As soon as the backplane is put in the equation -- and *only* for SSD
devices used as ZIL, we begin seeing the timeout/retries.

Seems like if it were a 1068E firmware issue we'd be seeing the issue
whether or not the backplane is in place... but maybe I'm missing
something.

Ray
Eric D. Mudama
2010-04-02 02:27:41 UTC
Permalink
Raw Message
Post by Ray Van Dolson
Well, haven't yet been able to try the firmware suggestion, but we did
replace the backplane. No change.
I'm not sure the firmware change would do any good either. As it is
now, as long as the SSD drives are attached directly to the LSI
controller (no intermediary backplane), everything works fine -- no
errors.
As soon as the backplane is put in the equation -- and *only* for SSD
devices used as ZIL, we begin seeing the timeout/retries.
Seems like if it were a 1068E firmware issue we'd be seeing the issue
whether or not the backplane is in place... but maybe I'm missing
something.
It's possible that the backplane leads to enough signal degredation
that the setup is now stressing error paths that simply aren't hit
with the direct-connect cabling.

This is the sort of issue that adapter (or device or expander)
firmware changes can mitigate or exacerbate.

--eric
--
Eric D. Mudama
***@mail.bounceswoosh.org
Marion Hakanson
2013-03-16 01:31:11 UTC
Permalink
Raw Message
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
datapool 978T 298T 680T 30% 1.00x ONLINE -
syspool 278G 104G 174G 37% 1.00x ONLINE -
Using a Dell R720 head unit, plus a bunch of Dell MD1200 JBODs dual pathed to
a couple of LSI SAS switches.
Thanks Ray,

We've been looking at those too (we've had good luck with our MD1200's).

How many HBA's in the R720?

Thanks and regards,

Marion
Ray Van Dolson
2013-03-16 01:56:10 UTC
Permalink
Raw Message
Post by Marion Hakanson
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
datapool 978T 298T 680T 30% 1.00x ONLINE -
syspool 278G 104G 174G 37% 1.00x ONLINE -
Using a Dell R720 head unit, plus a bunch of Dell MD1200 JBODs dual pathed to
a couple of LSI SAS switches.
Thanks Ray,
We've been looking at those too (we've had good luck with our MD1200's).
How many HBA's in the R720?
Thanks and regards,
Marion
We have qty 2 LSI SAS 9201-16e HBA's (Dell resold[1]).

Ray

[1] http://accessories.us.dell.com/sna/productdetail.aspx?c=us&l=en&s=hied&cs=65&sku=a4614101
Marion Hakanson
2013-03-16 02:35:25 UTC
Permalink
Raw Message
Post by Ray Van Dolson
Post by Marion Hakanson
Using a Dell R720 head unit, plus a bunch of Dell MD1200 JBODs dual pathed
to a couple of LSI SAS switches.
How many HBA's in the R720?
We have qty 2 LSI SAS 9201-16e HBA's (Dell resold[1]).
Sounds similar in approach to the Aberdeen product another sender referred to,
with SAS switch layout:
Loading Image...

One concern I had is that I compared our SuperMicro JBOD with 40x 4TB drives
in it, connected via a dual-port LSI SAS 9200-8e HBA, to the same pool layout
on a 40-slot server with 40x SATA drives in it. But the server uses no SAS
expanders, instead using SAS-to-SATA octopus cables to connect the drives
directly to three internal SAS HBA's (2x 9201-16i's, 1x 9211-8i).

What I found was that the internal pool was significantly faster for both
sequential and random I/O than the pool on the external JBOD.

My conclusion was that I would not want to exceed ~48 drives on a single
8-port SAS HBA. So I thought that running the I/O of all your hundreds
of drives through only two HBA's would be a bottleneck.

LSI's specs say 4800MBytes/sec for an 8-port SAS HBA, but 4000MBytes/sec
for that card in an x8 PCIe-2.0 slot. Sure, the newer 9207-8e is rated
at 8000MBytes/sec in an x8 PCIe-3.0 slot, but it still has only the same
8 SAS ports going at 4800MBytes/sec.

Yes, I know the disks probably can't go that fast. But in my tests
above, the internal 40-disk pool measures 2000MBytes/sec sequential
reads and writes, while the external 40-disk JBOD measures at 1500
to 1700 MBytes/sec. Not a lot slower, but significantly slower, so
I do think the number of HBA's makes a difference.

At the moment, I'm leaning toward piling six, eight, or ten HBA's into
a server, preferably one with dual IOH's (thus two PCIe busses), and
connecting dual-path JBOD's in that manner.

I hadn't looked into SAS switches much, but they do look more reliable
than daisy-chaining a bunch of JBOD's together. I just haven't seen
how to get more bandwidth through them to a single host.

Regards,

Marion
Trey Palmer
2013-03-16 05:30:41 UTC
Permalink
Raw Message
I know it's heresy these days, but given the I/O throughput you're looking for and the amount you're going to spend on disks, a T5-2 could make sense when they're released (I think) later this month.

Crucial sells RAM they guarantee for use in SPARC T-series, and since you're at an edu the academic discount is 35%. So A T4-2 with 512GB RAM could be had for under $35K shortly after release, 4-5 months before the E5 Xeon was released. It seemed a surprisingly good deal to me.

The T5-2 has 32x3.6GHz cores, 256 threads and ~150GB/s aggregate memory bandwidth. In my testing a T4-1 can compete with a 12-core E-5 box on I/O and memory bandwidth, and this thing is about 5 times bigger than the T4-1. It should have at least 10 PCIe's and will take 32 DIMMs minimum, maybe 64. And is likely to cost you less than $50K with aftermarket RAM.

-- Trey
Post by Marion Hakanson
Post by Ray Van Dolson
Post by Marion Hakanson
Using a Dell R720 head unit, plus a bunch of Dell MD1200 JBODs dual pathed
to a couple of LSI SAS switches.
How many HBA's in the R720?
We have qty 2 LSI SAS 9201-16e HBA's (Dell resold[1]).
Sounds similar in approach to the Aberdeen product another sender referred to,
http://www.aberdeeninc.com/images/1-up-petarack2.jpg
One concern I had is that I compared our SuperMicro JBOD with 40x 4TB drives
in it, connected via a dual-port LSI SAS 9200-8e HBA, to the same pool layout
on a 40-slot server with 40x SATA drives in it. But the server uses n
expanders, instead using SAS-to-SATA octopus cables to connect the drives
directly to three internal SAS HBA's (2x 9201-16i's, 1x 9211-8i).
What I found was that the internal pool was significantly faster for both
sequential and random I/O than the pool on the external JBOD.
My conclusion was that I would not want to exceed ~48 drives on a single
8-port SAS HBA. So I thought that running the I/O of all your hundreds
of drives through only two HBA's would be a bottleneck.
LSI's specs say 4800MBytes/sec for an 8-port SAS HBA, but 4000MBytes/sec
for that card in an x8 PCIe-2.0 slot. Sure, the newer 9207-8e is rated
at 8000MBytes/sec in an x8 PCIe-3.0 slot, but it still has only the same
8 SAS ports going at 4800MBytes/sec.
Yes, I know the disks probably can't go that fast. But in my tests
above, the internal 40-disk pool measures 2000MBytes/sec sequential
reads and writes, while the external 40-disk JBOD measures at 1500
to 1700 MBytes/sec. Not a lot slower, but significantly slower, so
I do think the number of HBA's makes a difference.
At the moment, I'm leaning toward piling six, eight, or ten HBA's into
a server, preferably one with dual IOH's (thus two PCIe busses), and
connecting dual-path JBOD's in that manner.
I hadn't looked into SAS switches much, but they do look more reliable
than daisy-chaining a bunch of JBOD's together. I just haven't seen
how to get more bandwidth through them to a single host.
Regards,
Marion
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/22500336-78e51065
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
Loading...