Discussion:
Scrub works in parallel?
Kalle Anka
2012-06-10 18:29:25 UTC
Permalink
Assume we have 100 disks in one zpool. Assume it takes 5 hours to scrub one disk. If I scrub the zpool, how long time will it take?


Will it scrub one disk at a time, so it will take 500 hours, i.e. in sequence, just serial? Or is it possible to run the scrub in parallel, so it takes 5h no matter how many disks?
Tomas Forsman
2012-06-10 20:01:09 UTC
Permalink
Post by Kalle Anka
Assume we have 100 disks in one zpool. Assume it takes 5 hours to
scrub one disk. If I scrub the zpool, how long time will it take?
Will it scrub one disk at a time, so it will take 500 hours, i.e. in
sequence, just serial? Or is it possible to run the scrub in parallel,
so it takes 5h no matter how many disks?
It walks the filesystem/pool trees, so it's not just reading the disk
from track 0 to track 12345, but validates all possible copies.

/Tomas
--
Tomas Forsman, ***@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of UmeƄ
`- Sysadmin at {cs,acc}.umu.se
Edward Ned Harvey
2012-06-11 01:37:33 UTC
Permalink
Post by Kalle Anka
Assume we have 100 disks in one zpool. Assume it takes 5 hours to scrub
one
Post by Kalle Anka
disk. If I scrub the zpool, how long time will it take?
Will it scrub one disk at a time, so it will take 500 hours, i.e. in
sequence, just
Post by Kalle Anka
serial? Or is it possible to run the scrub in parallel, so it takes 5h no
matter
Post by Kalle Anka
how many disks?
It will be approximately parallel, because it's actually scrubbing only the
used blocks, and the order it scrubs in will be approximately the order they
were written, which was intentionally parallel.

Aside from that, your question doesn't really make sense, because you don't
just stick a bunch of disks in a pool. You make a pool out of vdev's which
are made of storage devices (in this case, disks.) The type and size of
vdev (raidz, raidzN, mirror, etc) will greatly affect the performance, as
well as your data usage patterns.

Scrubbing is an approximately random IOPS task. Mirrors parallelize random
IO much better than raid.

The amount of time it takes to scrub or resilver is dependent both on the
amount of used data on the vdev, and the on-disk ordering.
Jim Klimov
2012-06-11 13:05:39 UTC
Permalink
Post by Kalle Anka
Post by Kalle Anka
Assume we have 100 disks in one zpool. Assume it takes 5 hours to scrub
one
Post by Kalle Anka
disk. If I scrub the zpool, how long time will it take?
Will it scrub one disk at a time, so it will take 500 hours, i.e. in
sequence, just
Post by Kalle Anka
serial? Or is it possible to run the scrub in parallel, so it takes 5h no
matter
Post by Kalle Anka
how many disks?
It will be approximately parallel, because it's actually scrubbing only the
used blocks, and the order it scrubs in will be approximately the order they
were written, which was intentionally parallel.
What the other posters said, plus: 100 disks is quite a lot
of contention on the bus(es), so even if it is all parallel,
the bus and CPU bottlenecks would raise the scrubbing time
somewhat above the single-disk scrub time.

Roughly, if all else is ideal (i.e. no/few random seeks and
a fast scrub at 100Mbps/disk), the SATA3 interface at 6Gbit/s
(on the order of ~600Mbyte/s) will be maxed out at about
6 disks. If your disks are colocated on one HBA receptacle
(i.e. via a backplane), this may be an issue for many disks
in an enclosure (a 4-lane link will sustain about 24 drives
at such speed, and that's not the market's max speed).

Further on, the PCI buses will become a bottleneck and the
CPU processing power might become one too, and for a box
with 100 disks this may be noticeable, depending on the other
architectural choices, components and their specs.

HTH,
//Jim
Roch Bourbonnais
2012-06-12 12:20:07 UTC
Permalink
Scrubs are run at very low priority and yield very quickly in the presence of other work.
So I really would not expect to see scrub create any impact on an other type of storage activity.
Resilvering will more aggressively push forward on what is has to do, but resilvering does not need to
read any of the data blocks on the non-resilvering vdevs.

-r
Post by Jim Klimov
Post by Kalle Anka
Post by Kalle Anka
Assume we have 100 disks in one zpool. Assume it takes 5 hours to scrub
one
Post by Kalle Anka
disk. If I scrub the zpool, how long time will it take?
Will it scrub one disk at a time, so it will take 500 hours, i.e. in
sequence, just
Post by Kalle Anka
serial? Or is it possible to run the scrub in parallel, so it takes 5h no
matter
Post by Kalle Anka
how many disks?
It will be approximately parallel, because it's actually scrubbing only the
used blocks, and the order it scrubs in will be approximately the order they
were written, which was intentionally parallel.
What the other posters said, plus: 100 disks is quite a lot
of contention on the bus(es), so even if it is all parallel,
the bus and CPU bottlenecks would raise the scrubbing time
somewhat above the single-disk scrub time.
Roughly, if all else is ideal (i.e. no/few random seeks and
a fast scrub at 100Mbps/disk), the SATA3 interface at 6Gbit/s
(on the order of ~600Mbyte/s) will be maxed out at about
6 disks. If your disks are colocated on one HBA receptacle
(i.e. via a backplane), this may be an issue for many disks
in an enclosure (a 4-lane link will sustain about 24 drives
at such speed, and that's not the market's max speed).
Further on, the PCI buses will become a bottleneck and the
CPU processing power might become one too, and for a box
with 100 disks this may be noticeable, depending on the other
architectural choices, components and their specs.
HTH,
//Jim
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Jim Klimov
2012-06-12 12:28:39 UTC
Permalink
Post by Roch Bourbonnais
Scrubs are run at very low priority and yield very quickly in the presence of other work.
So I really would not expect to see scrub create any impact on an other type of storage activity.
Resilvering will more aggressively push forward on what is has to do, but resilvering does not need to
read any of the data blocks on the non-resilvering vdevs.
Thanks, I agree - and that's important to notice, at least
on the current versions of ZFS :)

What I meant to stress that if a "scrub of one disk takes
5 hours" (whichever way that measurement can be made, such
as making a 1-disk pool with same data distribution), then
there are physical reasons why a 100-disk pool probably
would take some way more than 5 hours to scrub; or at least
which bottlenecks should be paid attention to in order to
minimize such increase in scrub time.

Also, yes, presence of pool activity would likely delay
the scrub completion time, perhaps even more noticeably.

Thanks,
//Jim Klimov
Roch Bourbonnais
2012-06-12 12:45:36 UTC
Permalink
The process should be scalable.
Scrub all of the data on one disk using one disk worth of IOPS
Scrub all of the data on N disks using N disk's worth of IOPS.

THat will take ~ the same total time.
-r
Post by Jim Klimov
Post by Roch Bourbonnais
Scrubs are run at very low priority and yield very quickly in the presence of other work.
So I really would not expect to see scrub create any impact on an other type of storage activity.
Resilvering will more aggressively push forward on what is has to do, but resilvering does not need to
read any of the data blocks on the non-resilvering vdevs.
Thanks, I agree - and that's important to notice, at least
on the current versions of ZFS :)
What I meant to stress that if a "scrub of one disk takes
5 hours" (whichever way that measurement can be made, such
as making a 1-disk pool with same data distribution), then
there are physical reasons why a 100-disk pool probably
would take some way more than 5 hours to scrub; or at least
which bottlenecks should be paid attention to in order to
minimize such increase in scrub time.
Also, yes, presence of pool activity would likely delay
the scrub completion time, perhaps even more noticeably.
Thanks,
//Jim Klimov
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Jim Klimov
2012-06-12 13:04:52 UTC
Permalink
Post by Roch Bourbonnais
The process should be scalable.
Scrub all of the data on one disk using one disk worth of IOPS
Scrub all of the data on N disks using N disk's worth of IOPS.
THat will take ~ the same total time.
IF the uplink or processing power or some other bottleneck does
not limit that (i.e. a single 4-lane SAS link to the daisy-chain
of 100 or 200 disks would likely impose a bandwidth bottleneck).

I know that well-engineered servers spec'ed by a vendor/integrator
for the customer's tasks and environment, such as those from Sun,
are built to avoid such apparent bottlenecks. But people who
construct their own storage should know of (and try to avoid)
such possible problem-makers ;)

Thanks, Roch,
//Jim Klimov
Richard Elling
2012-06-12 13:07:39 UTC
Permalink
Post by Jim Klimov
Post by Kalle Anka
Post by Kalle Anka
Assume we have 100 disks in one zpool. Assume it takes 5 hours to scrub
one
Post by Kalle Anka
disk. If I scrub the zpool, how long time will it take?
Will it scrub one disk at a time, so it will take 500 hours, i.e. in
sequence, just
Post by Kalle Anka
serial? Or is it possible to run the scrub in parallel, so it takes 5h no
matter
Post by Kalle Anka
how many disks?
It will be approximately parallel, because it's actually scrubbing only the
used blocks, and the order it scrubs in will be approximately the order they
were written, which was intentionally parallel.
What the other posters said, plus: 100 disks is quite a lot
of contention on the bus(es), so even if it is all parallel,
the bus and CPU bottlenecks would raise the scrubbing time
somewhat above the single-disk scrub time.
In general, this is not true for HDDs or modern CPUs. Modern systems
are overprovisioned for bandwidth. In fact, bandwidth has been a poor
design point for storage for a long time. Dave Patterson has some
interesting observations on this, now 8 years dated.
http://www.ll.mit.edu/HPEC/agendas/proc04/invited/patterson_keynote.pdf

SSDs tend to be a different story, and there is some interesting work being
done in this area, both on the systems side as well as the SSD side. This is
where the fun work is progressing :-)
-- richard
--
ZFS and performance consulting
http://www.RichardElling.com
Continue reading on narkive:
Loading...