Discussion:
Is there performance penalty when adding vdev to existing pool
Peter Wood
2013-02-20 23:27:25 UTC
Permalink
I'm using OpenIndiana 151a7, zpool v28, zfs v5.

When I bought my storage servers I intentionally left hdd slots available
so I can add another vdev when needed and delay immediate expenses.

After reading some posts on the mailing list I'm getting concerned about
degrading performance due to unequal distribution of data among the vdevs.
I still have a chance to migrate the data away, add all drives and rebuild
the pools and start fresh.

Before going that road I was hoping to hear your opinion on what will be
the best way to handle this.

System: Supermicro with 36 hdd bays. 28 bays filled with 3TB SAS 7.2K
enterprise drives. 8 bays available to add another vdev to the pool.

Pool configuration:
# zpool status pool01
pool: pool01
state: ONLINE
scan: scrub repaired 0 in 0h0m with 0 errors on Wed Nov 21 17:41:52 2012
config:

NAME STATE READ WRITE CKSUM
pool01 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
c8t5000CCA01AA8E3C0d0 ONLINE 0 0 0
c8t5000CCA01AA8E3F0d0 ONLINE 0 0 0
c8t5000CCA01AA8E394d0 ONLINE 0 0 0
c8t5000CCA01AA8E434d0 ONLINE 0 0 0
c8t5000CCA01AA793A0d0 ONLINE 0 0 0
c8t5000CCA01AA79380d0 ONLINE 0 0 0
c8t5000CCA01AA79398d0 ONLINE 0 0 0
c8t5000CCA01AB56B10d0 ONLINE 0 0 0
raidz2-1 ONLINE 0 0 0
c8t5000CCA01AB56B28d0 ONLINE 0 0 0
c8t5000CCA01AB56B64d0 ONLINE 0 0 0
c8t5000CCA01AB56B80d0 ONLINE 0 0 0
c8t5000CCA01AB56BB0d0 ONLINE 0 0 0
c8t5000CCA01AB56EA4d0 ONLINE 0 0 0
c8t5000CCA01ABDAEBCd0 ONLINE 0 0 0
c8t5000CCA01ABDAED0d0 ONLINE 0 0 0
c8t5000CCA01ABDAF1Cd0 ONLINE 0 0 0
raidz2-2 ONLINE 0 0 0
c8t5000CCA01ABDAF7Cd0 ONLINE 0 0 0
c8t5000CCA01ABDAF10d0 ONLINE 0 0 0
c8t5000CCA01ABDAF40d0 ONLINE 0 0 0
c8t5000CCA01ABDAF60d0 ONLINE 0 0 0
c8t5000CCA01ABDAF74d0 ONLINE 0 0 0
c8t5000CCA01ABDAF80d0 ONLINE 0 0 0
c8t5000CCA01ABDB04Cd0 ONLINE 0 0 0
c8t5000CCA01ABDB09Cd0 ONLINE 0 0 0
logs
mirror-3 ONLINE 0 0 0
c6t0d0 ONLINE 0 0 0
c6t1d0 ONLINE 0 0 0
cache
c6t2d0 ONLINE 0 0 0
c6t3d0 ONLINE 0 0 0
spares
c8t5000CCA01ABDB020d0 AVAIL
c8t5000CCA01ABDB060d0 AVAIL

errors: No known data errors
#

Will adding another vdev hurt the performance?

Thank you,

-- Peter
Ian Collins
2013-02-20 23:40:29 UTC
Permalink
Post by Peter Wood
I'm using OpenIndiana 151a7, zpool v28, zfs v5.
When I bought my storage servers I intentionally left hdd slots
available so I can add another vdev when needed and delay immediate
expenses.
After reading some posts on the mailing list I'm getting concerned
about degrading performance due to unequal distribution of data among
the vdevs. I still have a chance to migrate the data away, add all
drives and rebuild the pools and start fresh.
Before going that road I was hoping to hear your opinion on what will
be the best way to handle this.
System: Supermicro with 36 hdd bays. 28 bays filled with 3TB SAS 7.2K
enterprise drives. 8 bays available to add another vdev to the pool.
<snip>
Post by Peter Wood
#
Will adding another vdev hurt the performance?
How full is the pool?

When I've added (or grown an existing) vdev, I used zfs send to make a
copy of a suitably large filesystem, then deleted the original and
renamed the copy. I had to do this a couple of times to redistribute
data, but it saved a lot of down time.
--
Ian.
Sašo Kiselkov
2013-02-20 23:43:48 UTC
Permalink
Post by Peter Wood
Will adding another vdev hurt the performance?
In general, the answer is: no. ZFS will try to balance writes to
top-level vdevs in a fashion that assures even data distribution. If
your data is equally likely to be hit in all places, then you will not
incur any performance penalties. If, OTOH, newer data is more likely to
be hit than old data
, then yes, newer data will be served from fewer spindles. In that case
it is possible to do a send/receive of the affected datasets into new
locations and then renaming them.

Cheers,
--
Saso
Bob Friesenhahn
2013-02-20 23:46:50 UTC
Permalink
Post by Sašo Kiselkov
Post by Peter Wood
Will adding another vdev hurt the performance?
In general, the answer is: no. ZFS will try to balance writes to
top-level vdevs in a fashion that assures even data distribution. If
your data is equally likely to be hit in all places, then you will not
incur any performance penalties. If, OTOH, newer data is more likely to
be hit than old data
, then yes, newer data will be served from fewer spindles. In that case
it is possible to do a send/receive of the affected datasets into new
locations and then renaming them.
You have this reversed. The older data is served from fewer spindles
than data written after the new vdev is added. Performance with the
newer data should be improved.

Bob
--
Bob Friesenhahn
***@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Tim Cook
2013-02-20 23:48:54 UTC
Permalink
On Wed, Feb 20, 2013 at 5:46 PM, Bob Friesenhahn <
Post by Sašo Kiselkov
Post by Peter Wood
Will adding another vdev hurt the performance?
In general, the answer is: no. ZFS will try to balance writes to
top-level vdevs in a fashion that assures even data distribution. If
your data is equally likely to be hit in all places, then you will not
incur any performance penalties. If, OTOH, newer data is more likely to
be hit than old data
, then yes, newer data will be served from fewer spindles. In that case
it is possible to do a send/receive of the affected datasets into new
locations and then renaming them.
You have this reversed. The older data is served from fewer spindles than
data written after the new vdev is added. Performance with the newer data
should be improved.
Bob
That depends entirely on how full the pool is when the new vdev is added,
and how frequently the older data changes, snapshots, etc.

--Tim
Peter Wood
2013-02-20 23:56:33 UTC
Permalink
Currently the pool is about 20% full:
# zpool list pool01
NAME SIZE ALLOC FREE EXPANDSZ CAP DEDUP HEALTH ALTROOT
pool01 65.2T 15.4T 49.9T - 23% 1.00x ONLINE -
#

The old data and new data will be equally use after adding the vdev.

The FS hold tens of thousands of small images (~500KB) that are read, write
and new one added depending on what customers are doing. It's pretty heavy
on the file system. About 800 IOPS going up to 1500 IOPS at times.

Performance is important.
Post by Tim Cook
On Wed, Feb 20, 2013 at 5:46 PM, Bob Friesenhahn <
Post by Bob Friesenhahn
Post by Sašo Kiselkov
Post by Peter Wood
Will adding another vdev hurt the performance?
In general, the answer is: no. ZFS will try to balance writes to
top-level vdevs in a fashion that assures even data distribution. If
your data is equally likely to be hit in all places, then you will not
incur any performance penalties. If, OTOH, newer data is more likely to
be hit than old data
, then yes, newer data will be served from fewer spindles. In that case
it is possible to do a send/receive of the affected datasets into new
locations and then renaming them.
You have this reversed. The older data is served from fewer spindles
than data written after the new vdev is added. Performance with the newer
data should be improved.
Bob
That depends entirely on how full the pool is when the new vdev is added,
and how frequently the older data changes, snapshots, etc.
--Tim
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Ian Collins
2013-02-21 00:07:18 UTC
Permalink
Post by Peter Wood
# zpool list pool01
NAME SIZE ALLOC FREE EXPANDSZ CAP DEDUP HEALTH ALTROOT
pool01 65.2T 15.4T 49.9T - 23% 1.00x ONLINE -
#
So you will be about 15% full after adding a new vdev.

Unless you are likely to get too close to filling the enlarged pool, you
will probably be OK performance wise. The old data access times will be
no worse, the new data better.

If you can spread some of your old data around after added the new vdev,
do so.
--
Ian.
Ian Collins
2013-02-20 23:55:29 UTC
Permalink
Post by Bob Friesenhahn
Post by Sašo Kiselkov
Post by Peter Wood
Will adding another vdev hurt the performance?
In general, the answer is: no. ZFS will try to balance writes to
top-level vdevs in a fashion that assures even data distribution. If
your data is equally likely to be hit in all places, then you will not
incur any performance penalties. If, OTOH, newer data is more likely to
be hit than old data
, then yes, newer data will be served from fewer spindles. In that case
it is possible to do a send/receive of the affected datasets into new
locations and then renaming them.
You have this reversed. The older data is served from fewer spindles
than data written after the new vdev is added. Performance with the
newer data should be improved.
Not if the pool is close to full, when new data will end up on fewer
spindles (the new or extended vdev).
--
Ian.
Loading...