Discussion:
Slow zfs writes
(too old to reply)
Ram Chander
2013-02-11 12:55:19 UTC
Permalink
Raw Message
Hi,

My OmniOS host is expreiencing slow zfs writes ( around 30 times slower ).
iostat reports below error though pool is healthy. This is happening in
past 4 days though no change was done to system. Is the hard disks faulty ?
Please help.

# zpool status -v
***@host:~# zpool status -v
pool: test
state: ONLINE
status: The pool is formatted using a legacy on-disk format. The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
pool will no longer be accessible on software that does not support
feature flags.
config:

NAME STATE READ WRITE CKSUM
test ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
c2t0d0 ONLINE 0 0 0
c2t1d0 ONLINE 0 0 0
c2t2d0 ONLINE 0 0 0
c2t3d0 ONLINE 0 0 0
c2t4d0 ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
c2t5d0 ONLINE 0 0 0
c2t6d0 ONLINE 0 0 0
c2t7d0 ONLINE 0 0 0
c2t8d0 ONLINE 0 0 0
c2t9d0 ONLINE 0 0 0
raidz1-3 ONLINE 0 0 0
c2t12d0 ONLINE 0 0 0
c2t13d0 ONLINE 0 0 0
c2t14d0 ONLINE 0 0 0
c2t15d0 ONLINE 0 0 0
c2t16d0 ONLINE 0 0 0
c2t17d0 ONLINE 0 0 0
c2t18d0 ONLINE 0 0 0
c2t19d0 ONLINE 0 0 0
c2t20d0 ONLINE 0 0 0
c2t21d0 ONLINE 0 0 0
c2t22d0 ONLINE 0 0 0
c2t23d0 ONLINE 0 0 0
raidz1-4 ONLINE 0 0 0
c2t24d0 ONLINE 0 0 0
c2t25d0 ONLINE 0 0 0
c2t26d0 ONLINE 0 0 0
c2t27d0 ONLINE 0 0 0
c2t28d0 ONLINE 0 0 0
c2t29d0 ONLINE 0 0 0
c2t30d0 ONLINE 0 0 0
raidz1-5 ONLINE 0 0 0
c2t31d0 ONLINE 0 0 0
c2t32d0 ONLINE 0 0 0
c2t33d0 ONLINE 0 0 0
c2t34d0 ONLINE 0 0 0
c2t35d0 ONLINE 0 0 0
c2t36d0 ONLINE 0 0 0
c2t37d0 ONLINE 0 0 0
raidz1-6 ONLINE 0 0 0
c2t38d0 ONLINE 0 0 0
c2t39d0 ONLINE 0 0 0
c2t40d0 ONLINE 0 0 0
c2t41d0 ONLINE 0 0 0
c2t42d0 ONLINE 0 0 0
c2t43d0 ONLINE 0 0 0
c2t44d0 ONLINE 0 0 0
spares
c5t10d0 AVAIL
c5t11d0 AVAIL
c2t45d0 AVAIL
c2t46d0 AVAIL
c2t47d0 AVAIL



# iostat -En

c4t0d0 Soft Errors: 0 Hard Errors: 5 Transport Errors: 0
Vendor: iDRAC Product: Virtual CD Revision: 0323 Serial No:
Size: 0.00GB <0 bytes>
Media Error: 0 Device Not Ready: 5 No Device: 0 Recoverable: 0
Illegal Request: 1 Predictive Failure Analysis: 0
c3t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: iDRAC Product: LCDRIVE Revision: 0323 Serial No:
Size: 0.00GB <0 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c4t0d1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: iDRAC Product: Virtual Floppy Revision: 0323 Serial No:
Size: 0.00GB <0 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0


***@host:~# fmadm faulty
--------------- ------------------------------------ --------------
---------
TIME EVENT-ID MSG-ID
SEVERITY
--------------- ------------------------------------ --------------
---------
Jan 05 08:21:09 7af1ab3c-83c2-602d-d4b9-f9040db6944a ZFS-8000-HC
Major

Host : host
Platform : PowerEdge-R810
Product_sn :

Fault class : fault.fs.zfs.io_failure_wait
Affects : zfs://pool=test
faulted but still in service
Problem in : zfs://pool=test
faulted but still in service

Description : The ZFS pool has experienced currently unrecoverable I/O
failures. Refer to http://illumos.org/msg/ZFS-8000-HCfor
more information.

Response : No automated response will be taken.

Impact : Read and write I/Os cannot be serviced.

Action : Make sure the affected devices are connected, then run
'zpool clear'.

Regards,
Ram
Roy Sigurd Karlsbakk
2013-02-11 16:53:43 UTC
Permalink
Raw Message
--------------- ------------------------------------ --------------
---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ --------------
---------
Jan 05 08:21:09 7af1ab3c-83c2-602d-d4b9-f9040db6944a ZFS-8000-HC Major
Host : host
Platform : PowerEdge-R810
Fault class : fault.fs.zfs.io_failure_wait
Affects : zfs://pool=test
faulted but still in service
Problem in : zfs://pool=test
faulted but still in service
Description : The ZFS pool has experienced currently unrecoverable I/O
failures. Refer to http://illumos.org/msg/ZFS-8000-HC for
more information.
Response : No automated response will be taken.
Impact : Read and write I/Os cannot be serviced.
Action : Make sure the affected devices are connected, then run
'zpool clear'.
--
The pool looks healthy to me, but it it isn't very well balanced. Have you been adding new VDEVs on your way to grow it? Check if of the VDEVs are fuller than others. I don't have an OI/IllumOS system available ATM, but IIRC this can be done with iostat -v. Older versions of ZFS striped to all VDEVs regardless to fill, which slowed down the write speeds rather horribly if some VDEVs were full (>90%). This shouldn't be the case with OmniOS, but it *may* be the case with an old zpool version. I don't know. I'd check fill rate of the VDEVs first, then perhaps try to upgrade the zpool unless you have to be able to mount it on an older version of zpool (on S10 or similar). Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 98013356 ***@karlsbakk.net http://blogg.karlsbakk.net/ GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementÊrt imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.
Ram Chander
2013-02-12 05:08:29 UTC
Permalink
Raw Message
Hi Roy,
You are right. So it looks like re-distribution issue. Initially there
were two Vdev with 24 disks ( disk 0-23 ) for close to year. After which
which we added 24 more disks and created additional vdevs. The initial
vdevs are filled up and so write speed declined. Now how to find files
that are present in a Vdev or a disk. That way I can remove and re-copy
back to distribute data. Any other way to solve this ?

Total capacity of pool - 98Tb
Used - 44Tb
Free - 54 Tb

***@host:# zpool iostat -v
capacity operations bandwidth
pool alloc free read write read write
----------- ----- ----- ----- ----- ----- -----
test 54.0T 62.7T 52 1.12K 2.16M 5.78M
raidz1 11.2T 2.41T 13 30 176K 146K
c2t0d0 - - 5 18 42.1K 39.0K
c2t1d0 - - 5 18 42.2K 39.0K
c2t2d0 - - 5 18 42.5K 39.0K
c2t3d0 - - 5 18 42.9K 39.0K
c2t4d0 - - 5 18 42.6K 39.0K
raidz1 13.3T 308G 13 100 213K 521K
c2t5d0 - - 5 94 50.8K 135K
c2t6d0 - - 5 94 51.0K 135K
c2t7d0 - - 5 94 50.8K 135K
c2t8d0 - - 5 94 51.1K 135K
c2t9d0 - - 5 94 51.1K 135K
raidz1 13.4T 19.1T 9 455 743K 2.31M
c2t12d0 - - 3 137 69.6K 235K
c2t13d0 - - 3 129 69.4K 227K
c2t14d0 - - 3 139 69.6K 235K
c2t15d0 - - 3 131 69.6K 227K
c2t16d0 - - 3 141 69.6K 235K
c2t17d0 - - 3 132 69.5K 227K
c2t18d0 - - 3 142 69.6K 235K
c2t19d0 - - 3 133 69.6K 227K
c2t20d0 - - 3 143 69.6K 235K
c2t21d0 - - 3 133 69.5K 227K
c2t22d0 - - 3 143 69.6K 235K
c2t23d0 - - 3 133 69.5K 227K
raidz1 2.44T 16.6T 5 103 327K 485K
c2t24d0 - - 2 48 50.8K 87.4K
c2t25d0 - - 2 49 50.7K 87.4K
c2t26d0 - - 2 49 50.8K 87.3K
c2t27d0 - - 2 49 50.8K 87.3K
c2t28d0 - - 2 49 50.8K 87.3K
c2t29d0 - - 2 49 50.8K 87.3K
c2t30d0 - - 2 49 50.8K 87.3K
raidz1 8.18T 10.8T 5 295 374K 1.54M
c2t31d0 - - 2 131 58.2K 279K
c2t32d0 - - 2 131 58.1K 279K
c2t33d0 - - 2 131 58.2K 279K
c2t34d0 - - 2 132 58.2K 279K
c2t35d0 - - 2 132 58.1K 279K
c2t36d0 - - 2 133 58.3K 279K
c2t37d0 - - 2 133 58.2K 279K
raidz1 5.42T 13.6T 5 163 383K 823K
c2t38d0 - - 2 61 59.4K 146K
c2t39d0 - - 2 61 59.3K 146K
c2t40d0 - - 2 61 59.4K 146K
c2t41d0 - - 2 61 59.4K 146K
c2t42d0 - - 2 61 59.3K 146K
c2t43d0 - - 2 62 59.2K 146K
c2t44d0 - - 2 62 59.3K 146K
Post by Ram Chander
--------------- ------------------------------------ --------------
---------
TIME EVENT-ID MSG-ID
SEVERITY
--------------- ------------------------------------ --------------
---------
Jan 05 08:21:09 7af1ab3c-83c2-602d-d4b9-f9040db6944a ZFS-8000-HC
Major
Host : host
Platform : PowerEdge-R810
Fault class : fault.fs.zfs.io_failure_wait
Affects : zfs://pool=test
faulted but still in service
Problem in : zfs://pool=test
faulted but still in service
Description : The ZFS pool has experienced currently unrecoverable I/O
failures. Refer to http://illumos.org/msg/ZFS-8000-HCfor
more information.
Response : No automated response will be taken.
Impact : Read and write I/Os cannot be serviced.
Action : Make sure the affected devices are connected, then run
'zpool clear'.
--
The pool looks healthy to me, but it it isn't very well balanced. Have you
been adding new VDEVs on your way to grow it? Check if of the VDEVs are
fuller than others. I don't have an OI/IllumOS system available ATM, but
IIRC this can be done with iostat -v. Older versions of ZFS striped to all
VDEVs regardless to fill, which slowed down the write speeds rather
horribly if some VDEVs were full (>90%). This shouldn't be the case with
OmniOS, but it *may* be the case with an old zpool version. I don't know.
I'd check fill rate of the VDEVs first, then perhaps try to upgrade the
zpool unless you have to be able to mount it on an older version of zpool
(on S10 or similar).
Vennlige hilsener / Best regards
roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt.
Det er et elementært imperativ for alle pedagoger å unngå eksessiv
anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller
eksisterer adekvate og relevante synonymer på norsk.
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Ian Collins
2013-02-12 09:32:21 UTC
Permalink
Raw Message
Post by Ram Chander
Hi Roy,
You are right. So it looks like re-distribution issue. Initially
there were two Vdev with 24 disks ( disk 0-23 ) for close to year.
After which which we added 24 more disks and created additional
vdevs. The initial vdevs are filled up and so write speed declined.
Now how to find files that are present in a Vdev or a disk. That way
I can remove and re-copy back to distribute data. Any other way to
solve this ?
The only way is to avoid the problem in the first place by not mixing
vdev sizes in a pool.
--
Ian.
Jim Klimov
2013-02-12 10:22:52 UTC
Permalink
Raw Message
Post by Ian Collins
Post by Ram Chander
Hi Roy,
You are right. So it looks like re-distribution issue. Initially there
were two Vdev with 24 disks ( disk 0-23 ) for close to year. After
which which we added 24 more disks and created additional vdevs. The
initial vdevs are filled up and so write speed declined. Now how to
find files that are present in a Vdev or a disk. That way I can remove
and re-copy back to distribute data. Any other way to solve this ?
The only way is to avoid the problem in the first place by not mixing
vdev sizes in a pool.
Well, that disbalance is there - in the zpool status printout we see
raidz1 top-level vdevs of size 5, 5, 12, 7, 7, 7 disks and some 5 spares
- which seems to sum up to 48 ;)

Depending on disk size, it might be possible that tlvdev sizes in
gigabytes were kept the same (i.e. a raidz set with twice as many
disks of half size), but we have no info on this detail and it is
unlikely. The disk sets being in one pool, this would still quite
disbalance the load among spindles and IO buses.

Beside all that - with the "older" tlvdev's being more full than
the "newer" ones, there is the disbalance which wouldn't be avoided
by not mixing vdev sizes - writes into newer ones are more likely
to quickly find available "holes", while writes into older ones are more
fragmented and longer data inspection is needed to find a hole -
if not even the gang-block fragmentation. These two are, I believe,
the basis for performance drop on "full" pools, with the measure
being rather the mix of IO patterns and fragmentation of data and
holes.

I think there were developments in illumos ZFS to address more
writes onto devices with more available space; I am not sure if
the average write latency to a tlvdev was monitored and taken
into account during write-targeting decisions (which would also
wrap the case of failing devices which take longer to respond).
I am not sure which portions nave been completed and integrated
into common illumos-gate.

As was suggested, you can use "zpool iostat -v 5" to monitor IOs
to the pool with a fanout per TLVDEV and per disk, and witness
possible patterns there. Do keep in mind, however, that for a
non-failed raidz set you should see reads from only the data
disks for a particular stripe, while parity disks are not used
unless a checksum mismatch occurs. On the average data should
be on all disks in such a manner that there is no "dedicated"
parity disk, but with small IOs you are likely to notice this.

If the budget permits, I'd suggest building (or leasing) another
system with balanced disk sets and replicating all data onto it,
then repurposing the older system - for example, to be a backup
of the newer box (also after remaking the disk layout).

As for the question of "which files are on the older disks" -
you can as a rule of thumb use the file creation/modification
time in comparison with the date when you expanded the pool ;)
Closer inspection could be done with a ZDB walk to print out
the DVA block addresses for blocks of a file (the DVA includes
the number of the top-level vdev), but that would take some
time - to determine which files you want to expect (likely
some band of sizes) and then to do these zdb walks.

Good luck,
//Jim
Ian Collins
2013-02-12 20:06:05 UTC
Permalink
Raw Message
Post by Jim Klimov
Post by Ian Collins
Post by Ram Chander
Hi Roy,
You are right. So it looks like re-distribution issue. Initially there
were two Vdev with 24 disks ( disk 0-23 ) for close to year. After
which which we added 24 more disks and created additional vdevs. The
initial vdevs are filled up and so write speed declined. Now how to
find files that are present in a Vdev or a disk. That way I can remove
and re-copy back to distribute data. Any other way to solve this ?
The only way is to avoid the problem in the first place by not mixing
vdev sizes in a pool.
I was a bit quick off the mark there, I didn't notice that some vdevs
were older than others.
Post by Jim Klimov
Well, that disbalance is there - in the zpool status printout we see
raidz1 top-level vdevs of size 5, 5, 12, 7, 7, 7 disks and some 5 spares
- which seems to sum up to 48 ;)
The vdev sizes are about (including parity space) 14, 14, 22, 19, 19,
19TB respectively and 127TB total. So even if the data is balanced, the
performance of this pool will still start to degrade once ~84TB (about
2/3 full) are used.

So the only viable long term solution is a rebuild, or putting bigger
drives in the two smallest vdevs.

In the short term, when I've had similar issues I used zfs send to copy
a large filesystem within the pool then renamed the copy to the original
name and deleted the original. This can be repeated until you have an
acceptable distribution.

One last thing: unless this is some form of backup pool, or the data on
it isn't important, avoid raidz vdevs in such a large pool!
--
Ian.
Loading...