Discussion:
poor CIFS and NFS performance
(too old to reply)
Eugen Leitl
2012-12-30 17:02:40 UTC
Permalink
Raw Message
Happy $holidays,

I have a pool of 8x ST31000340AS on an LSI 8-port adapter as
a raidz3 (no compression nor dedup) with reasonable bonnie++
1.03 values, e.g. 145 MByte/s Seq-Write @ 48% CPU and 291 MByte/s
Seq-Read @ 53% CPU. It scrubs with 230+ MByte/s with reasonable
system load. No hybrid pools yet. This is latest beta napp-it
on OpenIndiana 151a5 server, living on a dedicated 64 GByte SSD.

The system is a MSI E350DM-E33 with 8 GByte PC1333 DDR3
memory, no ECC. All the systems have Intel NICs with mtu 9000
enabled, including all switches in the path.

My problem is pretty poor network throughput. An NFS
mount on 12.04 64 bit Ubuntu (mtu 9000) or CIFS are
read at about 23 MBytes/s. Windows 7 64 bit (also jumbo
frames) reads at about 65 MBytes/s. The highest transfer
speed on Windows just touches 90 MByte/s, before falling
back to the usual 60-70 MBytes/s.

I kinda can live with above values, but I have a feeling
the setup should be able to saturate GBit Ethernet with
large file transfers, especially on Linux (20 MByte/s
is nothing to write home about).

Does anyone have any suggestions on how to debug/optimize
throughput?

Thanks, and happy 2013.

P.S. Not sure whether this is pathological, but the system
does produce occasional soft errors like e.g. dmesg

Dec 30 17:45:00 oizfs scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block: 0
Dec 30 17:45:00 oizfs scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number:
Dec 30 17:45:00 oizfs scsi: [ID 107833 kern.notice] Sense Key: Soft_Error
Dec 30 17:45:00 oizfs scsi: [ID 107833 kern.notice] ASC: 0x0 (<vendor unique code 0x0>), ASCQ: 0x1d, FRU: 0x0
Dec 30 17:45:01 oizfs scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/***@g5000c50009c72c48 (sd9):
Dec 30 17:45:01 oizfs Error for Command: <undecoded cmd 0xa1> Error Level: Recovered
Dec 30 17:45:01 oizfs scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block: 0
Dec 30 17:45:01 oizfs scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number:
Dec 30 17:45:01 oizfs scsi: [ID 107833 kern.notice] Sense Key: Soft_Error
Dec 30 17:45:01 oizfs scsi: [ID 107833 kern.notice] ASC: 0x0 (<vendor unique code 0x0>), ASCQ: 0x1d, FRU: 0x0
Dec 30 17:45:01 oizfs pcplusmp: [ID 805372 kern.info] pcplusmp: ide (ata) instance 0 irq 0xe vector 0x45 ioapic 0x3 intin 0xe is bound to cpu 0
Dec 30 17:45:01 oizfs pcplusmp: [ID 805372 kern.info] pcplusmp: ide (ata) instance 0 irq 0xe vector 0x45 ioapic 0x3 intin 0xe is bound to cpu 1
Dec 30 17:45:01 oizfs pcplusmp: [ID 805372 kern.info] pcplusmp: ide (ata) instance 0 irq 0xe vector 0x45 ioapic 0x3 intin 0xe is bound to cpu 0
Dec 30 17:45:01 oizfs scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/***@g5000c50009c73968 (sd4):
Dec 30 17:45:01 oizfs Error for Command: <undecoded cmd 0xa1> Error Level: Recovered
Dec 30 17:45:01 oizfs scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block: 0
Dec 30 17:45:01 oizfs scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number:
Dec 30 17:45:01 oizfs scsi: [ID 107833 kern.notice] Sense Key: Soft_Error
Dec 30 17:45:01 oizfs scsi: [ID 107833 kern.notice] ASC: 0x0 (<vendor unique code 0x0>), ASCQ: 0x1d, FRU: 0x0
Dec 30 17:45:03 oizfs scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/***@g5000c500098be9dd (sd10):
Dec 30 17:45:03 oizfs Error for Command: <undecoded cmd 0xa1> Error Level: Recovered
Dec 30 17:45:03 oizfs scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block: 0
Dec 30 17:45:03 oizfs scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number:
Dec 30 17:45:03 oizfs scsi: [ID 107833 kern.notice] Sense Key: Soft_Error
Dec 30 17:45:03 oizfs scsi: [ID 107833 kern.notice] ASC: 0x0 (<vendor unique code 0x0>), ASCQ: 0x1d, FRU: 0x0
Dec 30 17:45:04 oizfs scsi: [ID 107833 kern.warning] WARNING: /***@0,0/pci1462,***@11/***@3,0 (sd8):
Dec 30 17:45:04 oizfs Error for Command: <undecoded cmd 0xa1> Error Level: Recovered
Dec 30 17:45:04 oizfs scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block: 0
Dec 30 17:45:04 oizfs scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number:
Dec 30 17:45:04 oizfs scsi: [ID 107833 kern.notice] Sense Key: Soft_Error
Dec 30 17:45:04 oizfs scsi: [ID 107833 kern.notice] ASC: 0x0 (no additional sense info), ASCQ: 0x0, FRU: 0x0
Richard Elling
2012-12-30 18:40:39 UTC
Permalink
Raw Message
Post by Eugen Leitl
Happy $holidays,
I have a pool of 8x ST31000340AS on an LSI 8-port adapter as
a raidz3 (no compression nor dedup) with reasonable bonnie++
system load. No hybrid pools yet. This is latest beta napp-it
on OpenIndiana 151a5 server, living on a dedicated 64 GByte SSD.
The system is a MSI E350DM-E33 with 8 GByte PC1333 DDR3
memory, no ECC. All the systems have Intel NICs with mtu 9000
enabled, including all switches in the path.
Does it work faster with the default MTU?
Also check for retrans and errors, using the usual network performance
debugging checks.
Post by Eugen Leitl
My problem is pretty poor network throughput. An NFS
mount on 12.04 64 bit Ubuntu (mtu 9000) or CIFS are
read at about 23 MBytes/s. Windows 7 64 bit (also jumbo
frames) reads at about 65 MBytes/s. The highest transfer
speed on Windows just touches 90 MByte/s, before falling
back to the usual 60-70 MBytes/s.
I kinda can live with above values, but I have a feeling
the setup should be able to saturate GBit Ethernet with
large file transfers, especially on Linux (20 MByte/s
is nothing to write home about).
Does anyone have any suggestions on how to debug/optimize
throughput?
Thanks, and happy 2013.
P.S. Not sure whether this is pathological, but the system
does produce occasional soft errors like e.g. dmesg
More likely these are due to SMART commands not being properly handled
for SATA devices. They are harmless.
-- richard
Post by Eugen Leitl
Dec 30 17:45:00 oizfs scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block: 0
Dec 30 17:45:00 oizfs scsi: [ID 107833 kern.notice] Sense Key: Soft_Error
Dec 30 17:45:00 oizfs scsi: [ID 107833 kern.notice] ASC: 0x0 (<vendor unique code 0x0>), ASCQ: 0x1d, FRU: 0x0
Dec 30 17:45:01 oizfs Error for Command: <undecoded cmd 0xa1> Error Level: Recovered
Dec 30 17:45:01 oizfs scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block: 0
Dec 30 17:45:01 oizfs scsi: [ID 107833 kern.notice] Sense Key: Soft_Error
Dec 30 17:45:01 oizfs scsi: [ID 107833 kern.notice] ASC: 0x0 (<vendor unique code 0x0>), ASCQ: 0x1d, FRU: 0x0
Dec 30 17:45:01 oizfs pcplusmp: [ID 805372 kern.info] pcplusmp: ide (ata) instance 0 irq 0xe vector 0x45 ioapic 0x3 intin 0xe is bound to cpu 0
Dec 30 17:45:01 oizfs pcplusmp: [ID 805372 kern.info] pcplusmp: ide (ata) instance 0 irq 0xe vector 0x45 ioapic 0x3 intin 0xe is bound to cpu 1
Dec 30 17:45:01 oizfs pcplusmp: [ID 805372 kern.info] pcplusmp: ide (ata) instance 0 irq 0xe vector 0x45 ioapic 0x3 intin 0xe is bound to cpu 0
Dec 30 17:45:01 oizfs Error for Command: <undecoded cmd 0xa1> Error Level: Recovered
Dec 30 17:45:01 oizfs scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block: 0
Dec 30 17:45:01 oizfs scsi: [ID 107833 kern.notice] Sense Key: Soft_Error
Dec 30 17:45:01 oizfs scsi: [ID 107833 kern.notice] ASC: 0x0 (<vendor unique code 0x0>), ASCQ: 0x1d, FRU: 0x0
Dec 30 17:45:03 oizfs Error for Command: <undecoded cmd 0xa1> Error Level: Recovered
Dec 30 17:45:03 oizfs scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block: 0
Dec 30 17:45:03 oizfs scsi: [ID 107833 kern.notice] Sense Key: Soft_Error
Dec 30 17:45:03 oizfs scsi: [ID 107833 kern.notice] ASC: 0x0 (<vendor unique code 0x0>), ASCQ: 0x1d, FRU: 0x0
Dec 30 17:45:04 oizfs Error for Command: <undecoded cmd 0xa1> Error Level: Recovered
Dec 30 17:45:04 oizfs scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block: 0
Dec 30 17:45:04 oizfs scsi: [ID 107833 kern.notice] Sense Key: Soft_Error
Dec 30 17:45:04 oizfs scsi: [ID 107833 kern.notice] ASC: 0x0 (no additional sense info), ASCQ: 0x0, FRU: 0x0
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
--

***@RichardElling.com
+1-760-896-4422
Eugen Leitl
2013-01-02 10:03:55 UTC
Permalink
Raw Message
Post by Richard Elling
Post by Eugen Leitl
The system is a MSI E350DM-E33 with 8 GByte PC1333 DDR3
memory, no ECC. All the systems have Intel NICs with mtu 9000
enabled, including all switches in the path.
Does it work faster with the default MTU?
No, it was even slower, that's why I went from 1500 to 9000.
I estimate it brought ~20 MByte/s more peak on Windows 7 64 bit CIFS.
Post by Richard Elling
Also check for retrans and errors, using the usual network performance
debugging checks.
Wireshark or tcpdump on Linux/Windows? What would
you suggest for OI?
Post by Richard Elling
Post by Eugen Leitl
P.S. Not sure whether this is pathological, but the system
does produce occasional soft errors like e.g. dmesg
More likely these are due to SMART commands not being properly handled
Otherwise napp-it attests full SMART support.
Post by Richard Elling
for SATA devices. They are harmless.
Richard Elling
2013-01-02 18:36:34 UTC
Permalink
Raw Message
Post by Eugen Leitl
Post by Richard Elling
Post by Eugen Leitl
The system is a MSI E350DM-E33 with 8 GByte PC1333 DDR3
memory, no ECC. All the systems have Intel NICs with mtu 9000
enabled, including all switches in the path.
Does it work faster with the default MTU?
No, it was even slower, that's why I went from 1500 to 9000.
I estimate it brought ~20 MByte/s more peak on Windows 7 64 bit CIFS.
OK, then you have something else very wrong in your network.
Post by Eugen Leitl
Post by Richard Elling
Also check for retrans and errors, using the usual network performance
debugging checks.
Wireshark or tcpdump on Linux/Windows? What would
you suggest for OI?
Look at all of the stats for all NICs and switches on both ends of each wire.
Look for collisions (should be 0), drops (should be 0), dups (should be 0),
retrans (should be near 0), flow control (server shouldn't see flow control
activity), etc. There is considerable written material on how to diagnose
network flakiness.
Post by Eugen Leitl
Post by Richard Elling
Post by Eugen Leitl
P.S. Not sure whether this is pathological, but the system
does produce occasional soft errors like e.g. dmesg
More likely these are due to SMART commands not being properly handled
Otherwise napp-it attests full SMART support.
Post by Richard Elling
for SATA devices. They are harmless.
Yep, this is a SATA/SAS/SMART interaction where assumptions are made
that might not be true. Usually it means that the SMART probes are using SCSI
commands on SATA disks.
-- richard

--

***@RichardElling.com
+1-760-896-4422
Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
2012-12-31 12:27:10 UTC
Permalink
Raw Message
Post by Eugen Leitl
I have a pool of 8x ST31000340AS on an LSI 8-port adapter as
a raidz3 (no compression nor dedup) with reasonable bonnie++
For 8-disk raidz3 (effectively 5 disks) I would expect approx 640MB/s for both seq read and seq write. The first halving (from 640 down to 291) could maybe be explained by bottlenecking through a single HBA or something like that, so I wouldn't be too concerned about that. But the second halving, from 291 down to 145 ... A single disk should do 128MB/sec no problem, so the whole pool writing at only 145MB/sec sounds wrong to me.

But as you said ... This isn't the area of complaint... Moving on, you can start a new discussion about this if you want to later...
Post by Eugen Leitl
My problem is pretty poor network throughput. An NFS
mount on 12.04 64 bit Ubuntu (mtu 9000) or CIFS are
read at about 23 MBytes/s. Windows 7 64 bit (also jumbo
frames) reads at about 65 MBytes/s. The highest transfer
speed on Windows just touches 90 MByte/s, before falling
back to the usual 60-70 MBytes/s.
Does anyone have any suggestions on how to debug/optimize
throughput?
The first thing I would do is build another openindiana box and try NFS / CIFS to/from it. See how it behaves. Whenever I've seen this sort of problem before, it was version incompatibility requiring tweaks between the client and server. I don't know which version of samba / solaris cifs is being used ... But at some point in history (win7), windows transitioned from NTLM v1 to v2, and at that point, all the older servers became 4x slower with the new clients, but if you built a new server with the new clients, then the old version was 4x slower than the new.

Not to mention, I've had times when I couldn't even get linux & solars to *talk* to each other over NFS, due to version differences, nevermind tweak all the little performance knobs.

So my advice is to first eliminate any question about version / implementation differences, and see where that takes you.
Eugen Leitl
2013-01-03 20:33:29 UTC
Permalink
Raw Message
Post by Eugen Leitl
Happy $holidays,
I have a pool of 8x ST31000340AS on an LSI 8-port adapter as
Just a little update on the home NAS project.

I've set the pool sync to disabled, and added a couple
of

8. c4t1d0 <ATA-INTELSSDSA2M080-02G9 cyl 11710 alt 2 hd 224 sec 56>
/***@0,0/pci1462,***@11/***@1,0
9. c4t2d0 <ATA-INTELSSDSA2M080-02G9 cyl 11710 alt 2 hd 224 sec 56>
/***@0,0/pci1462,***@11/***@2,0

I had no clue what the partitions names (created with napp-it web
interface, a la 5% log and 95% cache, of 80 GByte) were and so
did a iostat -xnp

1.4 0.3 5.5 0.0 0.0 0.0 0.0 0.0 0 0 c4t1d0
0.1 0.0 3.7 0.0 0.0 0.0 0.0 0.5 0 0 c4t1d0s2
0.1 0.0 2.6 0.0 0.0 0.0 0.0 0.5 0 0 c4t1d0s8
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0 0 c4t1d0p0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t1d0p1
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t1d0p2
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t1d0p3
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t1d0p4
1.2 0.3 1.4 0.0 0.0 0.0 0.0 0.0 0 0 c4t2d0
0.0 0.0 0.6 0.0 0.0 0.0 0.0 0.4 0 0 c4t2d0s2
0.0 0.0 0.7 0.0 0.0 0.0 0.0 0.4 0 0 c4t2d0s8
0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0 0 c4t2d0p0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t2d0p1
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t2d0p2

then issued

# zpool add tank0 cache /dev/dsk/c4t1d0p1 /dev/dsk/c4t2d0p1
# zpool add tank0 log mirror /dev/dsk/c4t1d0p0 /dev/dsk/c4t2d0p0

which resulted in

***@oizfs:~# zpool status
pool: rpool
state: ONLINE
scan: scrub repaired 0 in 0h1m with 0 errors on Wed Jan 2 21:09:23 2013
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
c4t3d0s0 ONLINE 0 0 0

errors: No known data errors

pool: tank0
state: ONLINE
scan: scrub repaired 0 in 5h17m with 0 errors on Wed Jan 2 17:53:20 2013
config:

NAME STATE READ WRITE CKSUM
tank0 ONLINE 0 0 0
raidz3-0 ONLINE 0 0 0
c3t5000C500098BE9DDd0 ONLINE 0 0 0
c3t5000C50009C72C48d0 ONLINE 0 0 0
c3t5000C50009C73968d0 ONLINE 0 0 0
c3t5000C5000FD2E794d0 ONLINE 0 0 0
c3t5000C5000FD37075d0 ONLINE 0 0 0
c3t5000C5000FD39D53d0 ONLINE 0 0 0
c3t5000C5000FD3BC10d0 ONLINE 0 0 0
c3t5000C5000FD3E8A7d0 ONLINE 0 0 0
logs
mirror-1 ONLINE 0 0 0
c4t1d0p0 ONLINE 0 0 0
c4t2d0p0 ONLINE 0 0 0
cache
c4t1d0p1 ONLINE 0 0 0
c4t2d0p1 ONLINE 0 0 0

errors: No known data errors

which resulted in bonnie++
befo':

NAME SIZE Bonnie Date(y.m.d) File Seq-Wr-Chr %CPU Seq-Write %CPU Seq-Rewr %CPU Seq-Rd-Chr %CPU Seq-Read %CPU Rnd Seeks %CPU Files Seq-Create Rnd-Create
rpool 59.5G start 2012.12.28 15576M 24 MB/s 61 47 MB/s 18 40 MB/s 19 26 MB/s 98 273 MB/s 48 2657.2/s 25 16 12984/s 12058/s
tank0 7.25T start 2012.12.29 15576M 35 MB/s 86 145 MB/s 48 109 MB/s 50 25 MB/s 97 291 MB/s 53 819.9/s 12 16 12634/s 9194/s

aftuh:

-Wr-Chr %CPU Seq-Write %CPU Seq-Rewr %CPU Seq-Rd-Chr %CPU Seq-Read %CPU Rnd Seeks %CPU Files Seq-Create Rnd-Create
rpool 59.5G start 2012.12.28 15576M 24 MB/s 61 47 MB/s 18 40 MB/s 19 26 MB/s 98 273 MB/s 48 2657.2/s 25 16 12984/s 12058/s
tank0 7.25T start 2013.01.03 15576M 35 MB/s 86 149 MB/s 48 111 MB/s 50 26 MB/s 98 404 MB/s 76 1094.3/s 12 16 12601/s 9937/s

Does the layout make sense? Do the stats make sense, or is there still something very wrong
with that pool?

Thanks.
Richard Elling
2013-01-03 20:44:26 UTC
Permalink
Raw Message
Post by Eugen Leitl
Post by Eugen Leitl
Happy $holidays,
I have a pool of 8x ST31000340AS on an LSI 8-port adapter as
Just a little update on the home NAS project.
I've set the pool sync to disabled, and added a couple
of
8. c4t1d0 <ATA-INTELSSDSA2M080-02G9 cyl 11710 alt 2 hd 224 sec 56>
9. c4t2d0 <ATA-INTELSSDSA2M080-02G9 cyl 11710 alt 2 hd 224 sec 56>
Setting sync=disabled means your log SSDs (slogs) will not be used.
-- richard
Post by Eugen Leitl
I had no clue what the partitions names (created with napp-it web
interface, a la 5% log and 95% cache, of 80 GByte) were and so
did a iostat -xnp
1.4 0.3 5.5 0.0 0.0 0.0 0.0 0.0 0 0 c4t1d0
0.1 0.0 3.7 0.0 0.0 0.0 0.0 0.5 0 0 c4t1d0s2
0.1 0.0 2.6 0.0 0.0 0.0 0.0 0.5 0 0 c4t1d0s8
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0 0 c4t1d0p0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t1d0p1
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t1d0p2
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t1d0p3
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t1d0p4
1.2 0.3 1.4 0.0 0.0 0.0 0.0 0.0 0 0 c4t2d0
0.0 0.0 0.6 0.0 0.0 0.0 0.0 0.4 0 0 c4t2d0s2
0.0 0.0 0.7 0.0 0.0 0.0 0.0 0.4 0 0 c4t2d0s8
0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0 0 c4t2d0p0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t2d0p1
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t2d0p2
then issued
# zpool add tank0 cache /dev/dsk/c4t1d0p1 /dev/dsk/c4t2d0p1
# zpool add tank0 log mirror /dev/dsk/c4t1d0p0 /dev/dsk/c4t2d0p0
which resulted in
pool: rpool
state: ONLINE
scan: scrub repaired 0 in 0h1m with 0 errors on Wed Jan 2 21:09:23 2013
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
c4t3d0s0 ONLINE 0 0 0
errors: No known data errors
pool: tank0
state: ONLINE
scan: scrub repaired 0 in 5h17m with 0 errors on Wed Jan 2 17:53:20 2013
NAME STATE READ WRITE CKSUM
tank0 ONLINE 0 0 0
raidz3-0 ONLINE 0 0 0
c3t5000C500098BE9DDd0 ONLINE 0 0 0
c3t5000C50009C72C48d0 ONLINE 0 0 0
c3t5000C50009C73968d0 ONLINE 0 0 0
c3t5000C5000FD2E794d0 ONLINE 0 0 0
c3t5000C5000FD37075d0 ONLINE 0 0 0
c3t5000C5000FD39D53d0 ONLINE 0 0 0
c3t5000C5000FD3BC10d0 ONLINE 0 0 0
c3t5000C5000FD3E8A7d0 ONLINE 0 0 0
logs
mirror-1 ONLINE 0 0 0
c4t1d0p0 ONLINE 0 0 0
c4t2d0p0 ONLINE 0 0 0
cache
c4t1d0p1 ONLINE 0 0 0
c4t2d0p1 ONLINE 0 0 0
errors: No known data errors
which resulted in bonnie++
NAME SIZE Bonnie Date(y.m.d) File Seq-Wr-Chr %CPU Seq-Write %CPU Seq-Rewr %CPU Seq-Rd-Chr %CPU Seq-Read %CPU Rnd Seeks %CPU Files Seq-Create Rnd-Create
rpool 59.5G start 2012.12.28 15576M 24 MB/s 61 47 MB/s 18 40 MB/s 19 26 MB/s 98 273 MB/s 48 2657.2/s 25 16 12984/s 12058/s
tank0 7.25T start 2012.12.29 15576M 35 MB/s 86 145 MB/s 48 109 MB/s 50 25 MB/s 97 291 MB/s 53 819.9/s 12 16 12634/s 9194/s
-Wr-Chr %CPU Seq-Write %CPU Seq-Rewr %CPU Seq-Rd-Chr %CPU Seq-Read %CPU Rnd Seeks %CPU Files Seq-Create Rnd-Create
rpool 59.5G start 2012.12.28 15576M 24 MB/s 61 47 MB/s 18 40 MB/s 19 26 MB/s 98 273 MB/s 48 2657.2/s 25 16 12984/s 12058/s
tank0 7.25T start 2013.01.03 15576M 35 MB/s 86 149 MB/s 48 111 MB/s 50 26 MB/s 98 404 MB/s 76 1094.3/s 12 16 12601/s 9937/s
Does the layout make sense? Do the stats make sense, or is there still something very wrong
with that pool?
Thanks.
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
--

***@RichardElling.com
+1-760-896-4422
Eugen Leitl
2013-01-03 20:47:17 UTC
Permalink
Raw Message
Post by Richard Elling
Post by Eugen Leitl
Post by Eugen Leitl
Happy $holidays,
I have a pool of 8x ST31000340AS on an LSI 8-port adapter as
Just a little update on the home NAS project.
I've set the pool sync to disabled, and added a couple
of
8. c4t1d0 <ATA-INTELSSDSA2M080-02G9 cyl 11710 alt 2 hd 224 sec 56>
9. c4t2d0 <ATA-INTELSSDSA2M080-02G9 cyl 11710 alt 2 hd 224 sec 56>
Setting sync=disabled means your log SSDs (slogs) will not be used.
-- richard
Whoops. Set it back to sync=standard. Will rerun the bonnie++ once
the scrub finishes, and post the results.
Phillip Wagstrom
2013-01-03 21:21:33 UTC
Permalink
Raw Message
Eugen,

Be aware that p0 corresponds to the entire disk, regardless of how it is partitioned with fdisk. The fdisk partitions are 1 - 4. By using p0 for log and p1 for cache, you could very well be writing to same location on the SSD and corrupting things.
Personally, I'd recommend putting a standard Solaris fdisk partition on the drive and creating the two slices under that.

-Phil
Post by Eugen Leitl
Post by Eugen Leitl
Happy $holidays,
I have a pool of 8x ST31000340AS on an LSI 8-port adapter as
Just a little update on the home NAS project.
I've set the pool sync to disabled, and added a couple
of
8. c4t1d0 <ATA-INTELSSDSA2M080-02G9 cyl 11710 alt 2 hd 224 sec 56>
9. c4t2d0 <ATA-INTELSSDSA2M080-02G9 cyl 11710 alt 2 hd 224 sec 56>
I had no clue what the partitions names (created with napp-it web
interface, a la 5% log and 95% cache, of 80 GByte) were and so
did a iostat -xnp
1.4 0.3 5.5 0.0 0.0 0.0 0.0 0.0 0 0 c4t1d0
0.1 0.0 3.7 0.0 0.0 0.0 0.0 0.5 0 0 c4t1d0s2
0.1 0.0 2.6 0.0 0.0 0.0 0.0 0.5 0 0 c4t1d0s8
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0 0 c4t1d0p0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t1d0p1
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t1d0p2
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t1d0p3
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t1d0p4
1.2 0.3 1.4 0.0 0.0 0.0 0.0 0.0 0 0 c4t2d0
0.0 0.0 0.6 0.0 0.0 0.0 0.0 0.4 0 0 c4t2d0s2
0.0 0.0 0.7 0.0 0.0 0.0 0.0 0.4 0 0 c4t2d0s8
0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0 0 c4t2d0p0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t2d0p1
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t2d0p2
then issued
# zpool add tank0 cache /dev/dsk/c4t1d0p1 /dev/dsk/c4t2d0p1
# zpool add tank0 log mirror /dev/dsk/c4t1d0p0 /dev/dsk/c4t2d0p0
which resulted in
pool: rpool
state: ONLINE
scan: scrub repaired 0 in 0h1m with 0 errors on Wed Jan 2 21:09:23 2013
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
c4t3d0s0 ONLINE 0 0 0
errors: No known data errors
pool: tank0
state: ONLINE
scan: scrub repaired 0 in 5h17m with 0 errors on Wed Jan 2 17:53:20 2013
NAME STATE READ WRITE CKSUM
tank0 ONLINE 0 0 0
raidz3-0 ONLINE 0 0 0
c3t5000C500098BE9DDd0 ONLINE 0 0 0
c3t5000C50009C72C48d0 ONLINE 0 0 0
c3t5000C50009C73968d0 ONLINE 0 0 0
c3t5000C5000FD2E794d0 ONLINE 0 0 0
c3t5000C5000FD37075d0 ONLINE 0 0 0
c3t5000C5000FD39D53d0 ONLINE 0 0 0
c3t5000C5000FD3BC10d0 ONLINE 0 0 0
c3t5000C5000FD3E8A7d0 ONLINE 0 0 0
logs
mirror-1 ONLINE 0 0 0
c4t1d0p0 ONLINE 0 0 0
c4t2d0p0 ONLINE 0 0 0
cache
c4t1d0p1 ONLINE 0 0 0
c4t2d0p1 ONLINE 0 0 0
errors: No known data errors
which resulted in bonnie++
NAME SIZE Bonnie Date(y.m.d) File Seq-Wr-Chr %CPU Seq-Write %CPU Seq-Rewr %CPU Seq-Rd-Chr %CPU Seq-Read %CPU Rnd Seeks %CPU Files Seq-Create Rnd-Create
rpool 59.5G start 2012.12.28 15576M 24 MB/s 61 47 MB/s 18 40 MB/s 19 26 MB/s 98 273 MB/s 48 2657.2/s 25 16 12984/s 12058/s
tank0 7.25T start 2012.12.29 15576M 35 MB/s 86 145 MB/s 48 109 MB/s 50 25 MB/s 97 291 MB/s 53 819.9/s 12 16 12634/s 9194/s
-Wr-Chr %CPU Seq-Write %CPU Seq-Rewr %CPU Seq-Rd-Chr %CPU Seq-Read %CPU Rnd Seeks %CPU Files Seq-Create Rnd-Create
rpool 59.5G start 2012.12.28 15576M 24 MB/s 61 47 MB/s 18 40 MB/s 19 26 MB/s 98 273 MB/s 48 2657.2/s 25 16 12984/s 12058/s
tank0 7.25T start 2013.01.03 15576M 35 MB/s 86 149 MB/s 48 111 MB/s 50 26 MB/s 98 404 MB/s 76 1094.3/s 12 16 12601/s 9937/s
Does the layout make sense? Do the stats make sense, or is there still something very wrong
with that pool?
Thanks.
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Eugen Leitl
2013-01-03 21:33:14 UTC
Permalink
Raw Message
Post by Phillip Wagstrom
Eugen,
Be aware that p0 corresponds to the entire disk, regardless of how it is partitioned with fdisk. The fdisk partitions are 1 - 4. By using p0 for log and p1 for cache, you could very well be writing to same location on the SSD and corrupting things.
My partitions are like this:

partition> print
Current partition table (original):
Total disk cylinders available: 496 + 2 (reserved cylinders)

Part Tag Flag Cylinders Size Blocks
0 unassigned wm 0 0 (0/0/0) 0
1 unassigned wm 0 0 (0/0/0) 0
2 backup wu 0 - 11709 70.04GB (11710/0/0) 146890240
3 unassigned wm 0 0 (0/0/0) 0
4 unassigned wm 0 0 (0/0/0) 0
5 unassigned wm 0 0 (0/0/0) 0
6 unassigned wm 0 0 (0/0/0) 0
7 unassigned wm 0 0 (0/0/0) 0
8 boot wu 0 - 0 6.12MB (1/0/0) 12544
9 unassigned wm 0 0 (0/0/0) 0

am I writing to the same location?
Post by Phillip Wagstrom
Personally, I'd recommend putting a standard Solaris fdisk partition on the drive and creating the two slices under that.
Which command invocations would you use to do that, under Open Indiana?
Phillip Wagstrom
2013-01-03 21:44:54 UTC
Permalink
Raw Message
Post by Eugen Leitl
Post by Phillip Wagstrom
Eugen,
Be aware that p0 corresponds to the entire disk, regardless of how it is partitioned with fdisk. The fdisk partitions are 1 - 4. By using p0 for log and p1 for cache, you could very well be writing to same location on the SSD and corrupting things.
partition> print
Total disk cylinders available: 496 + 2 (reserved cylinders)
Part Tag Flag Cylinders Size Blocks
0 unassigned wm 0 0 (0/0/0) 0
1 unassigned wm 0 0 (0/0/0) 0
2 backup wu 0 - 11709 70.04GB (11710/0/0) 146890240
3 unassigned wm 0 0 (0/0/0) 0
4 unassigned wm 0 0 (0/0/0) 0
5 unassigned wm 0 0 (0/0/0) 0
6 unassigned wm 0 0 (0/0/0) 0
7 unassigned wm 0 0 (0/0/0) 0
8 boot wu 0 - 0 6.12MB (1/0/0) 12544
9 unassigned wm 0 0 (0/0/0) 0
am I writing to the same location?
Okay. The above are the slices within the Solaris fdisk partition. These would be the "s0" part of "c0t0d0s0". These are modified with via format under "partition".
p1 through p4 refers to the x86 fdisk partition which is administered with the fdisk command or called from the format command via "fdisk"
Post by Eugen Leitl
Post by Phillip Wagstrom
Personally, I'd recommend putting a standard Solaris fdisk partition on the drive and creating the two slices under that.
Which command invocations would you use to do that, under Open Indiana?
format -> partition then set the size of each there.

-Phil
Eugen Leitl
2013-01-03 21:52:57 UTC
Permalink
Raw Message
Post by Phillip Wagstrom
Post by Eugen Leitl
Post by Phillip Wagstrom
Eugen,
Be aware that p0 corresponds to the entire disk, regardless of how it is partitioned with fdisk. The fdisk partitions are 1 - 4. By using p0 for log and p1 for cache, you could very well be writing to same location on the SSD and corrupting things.
partition> print
Total disk cylinders available: 496 + 2 (reserved cylinders)
Part Tag Flag Cylinders Size Blocks
0 unassigned wm 0 0 (0/0/0) 0
1 unassigned wm 0 0 (0/0/0) 0
2 backup wu 0 - 11709 70.04GB (11710/0/0) 146890240
3 unassigned wm 0 0 (0/0/0) 0
4 unassigned wm 0 0 (0/0/0) 0
5 unassigned wm 0 0 (0/0/0) 0
6 unassigned wm 0 0 (0/0/0) 0
7 unassigned wm 0 0 (0/0/0) 0
8 boot wu 0 - 0 6.12MB (1/0/0) 12544
9 unassigned wm 0 0 (0/0/0) 0
am I writing to the same location?
Okay. The above are the slices within the Solaris fdisk partition. These would be the "s0" part of "c0t0d0s0". These are modified with via format under "partition".
p1 through p4 refers to the x86 fdisk partition which is administered with the fdisk command or called from the format command via "fdisk"
Post by Eugen Leitl
Post by Phillip Wagstrom
Personally, I'd recommend putting a standard Solaris fdisk partition on the drive and creating the two slices under that.
Which command invocations would you use to do that, under Open Indiana?
format -> partition then set the size of each there.
Thanks. Apparently, napp-it web interface did not do what I asked it to do.
I'll try to remove the cache and the log devices from the pool, and redo it
from the command line interface.
Gea
2013-01-04 11:41:05 UTC
Permalink
Raw Message
Post by Eugen Leitl
Thanks. Apparently, napp-it web interface did not do what I asked it to do.
I'll try to remove the cache and the log devices from the pool, and redo it
from the command line interface.
napp-it up to 0.8 does not support slices or partitions
napp-it 0.9 supports partitions an offers partitioning with menu disk-partitions

You can reinitialize a disk with a missing or unwanted partition table with menu
disk-initialize
Cindy Swearingen
2013-01-03 22:21:42 UTC
Permalink
Raw Message
Free advice is cheap...

I personally don't see the advantage of caching reads
and logging writes to the same devices. (Is this recommended?)

If this pool is serving CIFS/NFS, I would recommend testing
for best performance with a mirrored log device first without
a separate cache device:

# zpool add tank0 log mirror c4t1d0 c4t2d0

Thanks, Cindy
Post by Phillip Wagstrom
Eugen,
Be aware that p0 corresponds to the entire disk, regardless of how it is partitioned with fdisk. The fdisk partitions are 1 - 4. By using p0 for log and p1 for cache, you could very well be writing to same location on the SSD and corrupting things.
Personally, I'd recommend putting a standard Solaris fdisk partition on the drive and creating the two slices under that.
-Phil
Post by Eugen Leitl
Post by Eugen Leitl
Happy $holidays,
I have a pool of 8x ST31000340AS on an LSI 8-port adapter as
Just a little update on the home NAS project.
I've set the pool sync to disabled, and added a couple
of
8. c4t1d0<ATA-INTELSSDSA2M080-02G9 cyl 11710 alt 2 hd 224 sec 56>
9. c4t2d0<ATA-INTELSSDSA2M080-02G9 cyl 11710 alt 2 hd 224 sec 56>
I had no clue what the partitions names (created with napp-it web
interface, a la 5% log and 95% cache, of 80 GByte) were and so
did a iostat -xnp
1.4 0.3 5.5 0.0 0.0 0.0 0.0 0.0 0 0 c4t1d0
0.1 0.0 3.7 0.0 0.0 0.0 0.0 0.5 0 0 c4t1d0s2
0.1 0.0 2.6 0.0 0.0 0.0 0.0 0.5 0 0 c4t1d0s8
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0 0 c4t1d0p0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t1d0p1
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t1d0p2
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t1d0p3
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t1d0p4
1.2 0.3 1.4 0.0 0.0 0.0 0.0 0.0 0 0 c4t2d0
0.0 0.0 0.6 0.0 0.0 0.0 0.0 0.4 0 0 c4t2d0s2
0.0 0.0 0.7 0.0 0.0 0.0 0.0 0.4 0 0 c4t2d0s8
0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0 0 c4t2d0p0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t2d0p1
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t2d0p2
then issued
# zpool add tank0 cache /dev/dsk/c4t1d0p1 /dev/dsk/c4t2d0p1
# zpool add tank0 log mirror /dev/dsk/c4t1d0p0 /dev/dsk/c4t2d0p0
which resulted in
pool: rpool
state: ONLINE
scan: scrub repaired 0 in 0h1m with 0 errors on Wed Jan 2 21:09:23 2013
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
c4t3d0s0 ONLINE 0 0 0
errors: No known data errors
pool: tank0
state: ONLINE
scan: scrub repaired 0 in 5h17m with 0 errors on Wed Jan 2 17:53:20 2013
NAME STATE READ WRITE CKSUM
tank0 ONLINE 0 0 0
raidz3-0 ONLINE 0 0 0
c3t5000C500098BE9DDd0 ONLINE 0 0 0
c3t5000C50009C72C48d0 ONLINE 0 0 0
c3t5000C50009C73968d0 ONLINE 0 0 0
c3t5000C5000FD2E794d0 ONLINE 0 0 0
c3t5000C5000FD37075d0 ONLINE 0 0 0
c3t5000C5000FD39D53d0 ONLINE 0 0 0
c3t5000C5000FD3BC10d0 ONLINE 0 0 0
c3t5000C5000FD3E8A7d0 ONLINE 0 0 0
logs
mirror-1 ONLINE 0 0 0
c4t1d0p0 ONLINE 0 0 0
c4t2d0p0 ONLINE 0 0 0
cache
c4t1d0p1 ONLINE 0 0 0
c4t2d0p1 ONLINE 0 0 0
errors: No known data errors
which resulted in bonnie++
NAME SIZE Bonnie Date(y.m.d) File Seq-Wr-Chr %CPU Seq-Write %CPU Seq-Rewr %CPU Seq-Rd-Chr %CPU Seq-Read %CPU Rnd Seeks %CPU Files Seq-Create Rnd-Create
rpool 59.5G start 2012.12.28 15576M 24 MB/s 61 47 MB/s 18 40 MB/s 19 26 MB/s 98 273 MB/s 48 2657.2/s 25 16 12984/s 12058/s
tank0 7.25T start 2012.12.29 15576M 35 MB/s 86 145 MB/s 48 109 MB/s 50 25 MB/s 97 291 MB/s 53 819.9/s 12 16 12634/s 9194/s
-Wr-Chr %CPU Seq-Write %CPU Seq-Rewr %CPU Seq-Rd-Chr %CPU Seq-Read %CPU Rnd Seeks %CPU Files Seq-Create Rnd-Create
rpool 59.5G start 2012.12.28 15576M 24 MB/s 61 47 MB/s 18 40 MB/s 19 26 MB/s 98 273 MB/s 48 2657.2/s 25 16 12984/s 12058/s
tank0 7.25T start 2013.01.03 15576M 35 MB/s 86 149 MB/s 48 111 MB/s 50 26 MB/s 98 404 MB/s 76 1094.3/s 12 16 12601/s 9937/s
Does the layout make sense? Do the stats make sense, or is there still something very wrong
with that pool?
Thanks.
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Eugen Leitl
2013-01-04 17:07:12 UTC
Permalink
Raw Message
Post by Phillip Wagstrom
Eugen,
Thanks Phillip and others, most illuminating (pun intended).
Post by Phillip Wagstrom
Be aware that p0 corresponds to the entire disk, regardless of how it is partitioned with fdisk. The fdisk partitions are 1 - 4. By using p0 for log and p1 for cache, you could very well be writing to same location on the SSD and corrupting things.
Does this mean that with

Part Tag Flag Cylinders Size Blocks
0 unassigned wm 0 - 668 4.00GB (669/0/0) 8391936
1 unassigned wm 669 - 12455 70.50GB (11787/0/0) 147856128
2 backup wu 0 - 12456 74.51GB (12457/0/0) 156260608
3 unassigned wm 0 0 (0/0/0) 0
4 unassigned wm 0 0 (0/0/0) 0
5 unassigned wm 0 0 (0/0/0) 0
6 unassigned wm 0 0 (0/0/0) 0
7 unassigned wm 0 0 (0/0/0) 0
8 boot wu 0 - 0 6.12MB (1/0/0) 12544
9 unassigned wm 0 0 (0/0/0) 0

/dev/dsk/c4t1d0p0 /dev/dsk/c4t2d0p0 means the whole disk?
I thought the backup partition would be that, and that's p2?
Post by Phillip Wagstrom
Personally, I'd recommend putting a standard Solaris fdisk partition on the drive and creating the two slices under that.
Can you please give me the rundown for commands for that?
I seem to partition a Solaris disk every decade, or so, so
I have no idea what I'm doing.

I've redone the

# zpool remove tank0 /dev/dsk/c4t1d0p1 /dev/dsk/c4t2d0p1
# zpool remove tank0 mirror-1

so the pool is back to mice and pumpkins:

pool: tank0
state: ONLINE
scan: scrub in progress since Fri Jan 4 16:55:12 2013
773G scanned out of 3.49T at 187M/s, 4h15m to go
0 repaired, 21.62% done
config:

NAME STATE READ WRITE CKSUM
tank0 ONLINE 0 0 0
raidz3-0 ONLINE 0 0 0
c3t5000C500098BE9DDd0 ONLINE 0 0 0
c3t5000C50009C72C48d0 ONLINE 0 0 0
c3t5000C50009C73968d0 ONLINE 0 0 0
c3t5000C5000FD2E794d0 ONLINE 0 0 0
c3t5000C5000FD37075d0 ONLINE 0 0 0
c3t5000C5000FD39D53d0 ONLINE 0 0 0
c3t5000C5000FD3BC10d0 ONLINE 0 0 0
c3t5000C5000FD3E8A7d0 ONLINE 0 0 0

errors: No known data errors
Robert Milkowski
2013-01-04 18:57:44 UTC
Permalink
Raw Message
Post by Phillip Wagstrom
Personally, I'd recommend putting a standard Solaris fdisk
partition on the drive and creating the two slices under that.
Why? In most cases giving zfs an entire disk is the best option.
I wouldn't bother with any manual partitioning.
--
Robert Milkowski
http://milek.blogspot.com
Eugen Leitl
2013-01-04 19:07:13 UTC
Permalink
Raw Message
Post by Robert Milkowski
Post by Phillip Wagstrom
Personally, I'd recommend putting a standard Solaris fdisk
partition on the drive and creating the two slices under that.
Why? In most cases giving zfs an entire disk is the best option.
I wouldn't bother with any manual partitioning.
Caches are ok, but log needs a mirror, and I only have
two SSDs.
Phillip Wagstrom
2013-01-04 19:10:51 UTC
Permalink
Raw Message
If you're dedicating the disk to a single task (data, SLOG, L2ARC) then absolutely. If you're splitting tasks and wanting to make a drive do two things, like SLOG and L2ARC, then you have to do this.

Some of the confusion here is between what is a traditional FDISK partition (p1, p2, p3, p4, etc.) and what is a Solaris slice (s0 - s9), which lives inside a FDISK partition on x86.

-Phil
Post by Robert Milkowski
Post by Phillip Wagstrom
Personally, I'd recommend putting a standard Solaris fdisk
partition on the drive and creating the two slices under that.
Why? In most cases giving zfs an entire disk is the best option.
I wouldn't bother with any manual partitioning.
--
Robert Milkowski
http://milek.blogspot.com
Loading...