Discussion:
How much do we really want zpool remove?
Jeremy Teo
2007-01-18 10:55:39 UTC
Permalink
On the issue of the ability to remove a device from a zpool, how
useful/pressing is this feature? Or is this more along the line of
"nice to have"?
--
Regards,
Jeremy
p***@poczta.fm
2007-01-18 11:31:56 UTC
Permalink
Post by Jeremy Teo
On the issue of the ability to remove a device from a zpool, how
useful/pressing is this feature? Or is this more along the line of
"nice to have"?
If you think "remove a device from a zpool" = "to shrink a pool" then
it is really usefull. Definitely really usefull.
Do you need any example ?


przemol

----------------------------------------------------------------------
Lufa dla generala. Zobacz >> http://link.interia.pl/f19e1
Dick Davies
2007-01-18 11:34:42 UTC
Permalink
Post by Jeremy Teo
On the issue of the ability to remove a device from a zpool, how
useful/pressing is this feature? Or is this more along the line of
"nice to have"?
It's very useful if you accidentally create a concat rather than mirror
of an existing zpool. Otherwise you have to buy another drive :)
--
Rasputin :: Jack of All Trades - Master of Nuns
http://number9.hellooperator.net/
Boyd Adamson
2007-01-18 13:24:28 UTC
Permalink
Post by Jeremy Teo
On the issue of the ability to remove a device from a zpool, how
useful/pressing is this feature? Or is this more along the line of
"nice to have"?
Assuming we're talking about removing a top-level vdev..

I introduce new sysadmins to ZFS on a weekly basis. After 2 hours of
introduction this is the single feature that they most often realise
is "missing".

The most common reason is migration of data to new storage
infrastructure. The experience is often that the growth in disk size
allows the new storage to consist of fewer disks/LUNs than the old.

I can see that is will come increasingly needed as more and more
storage goes under ZFS. Sure, we can put 256 quadrillion zettabytes
in the pool, but if you accidentally add a disk to the wrong pool or
with the wrong redundancy you have a long long wait for your tape
drive :)

Boyd
Anantha N. Srirama
2007-01-18 13:32:03 UTC
Permalink
I can vouch for this situation. I had to go through a long maintenance to accomplish the following:

- 50 x 64GB drives in a zpool; needed to seperate out 15 of them out due to performance issues. There was no need to increase storage capacity.

Because I couldn't yank 15 drives from the existing pool to create a UFS filesystem I had to go evacuate the entire 50 disk pool, recreate a new pool and the UFS filesystem, and then repopulate the filesystems.

I think this feature will add to the adoption rate of ZFS. However, I feel that this shouldn't be at the top of the 'to-do' list. I'll trade this feature for some of the performance enhancements that've been discussed on this group.


This message posted from opensolaris.org
Matthew Ahrens
2007-01-18 18:51:18 UTC
Permalink
Post by Jeremy Teo
On the issue of the ability to remove a device from a zpool, how
useful/pressing is this feature? Or is this more along the line of
"nice to have"?
This is a pretty high priority. We are working on it.

--matt
Erik Trimble
2007-01-18 19:29:23 UTC
Permalink
Post by Matthew Ahrens
Post by Jeremy Teo
On the issue of the ability to remove a device from a zpool, how
useful/pressing is this feature? Or is this more along the line of
"nice to have"?
This is a pretty high priority. We are working on it.
--matt
I'd consider it a lower priority than say, adding a drive to a RAIDZ
vdev, but yes, being able to reduce a zpool's size by removing devices
is quite useful, as it adds a considerable degree of flexibility that
(we) admins crave.
--
Erik Trimble
Java System Support
Mailstop: usca14-102
Phone: x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)
W***@fallon.com
2007-01-18 19:29:43 UTC
Permalink
Post by Erik Trimble
Post by Matthew Ahrens
Post by Jeremy Teo
On the issue of the ability to remove a device from a zpool, how
useful/pressing is this feature? Or is this more along the line of
"nice to have"?
This is a pretty high priority. We are working on it.
--matt
I'd consider it a lower priority than say, adding a drive to a RAIDZ
vdev, but yes, being able to reduce a zpool's size by removing devices
is quite useful, as it adds a considerable degree of flexibility that
(we) admins crave.
I would be surprised if much of the code to allow removal does not bring
device adds closer to reality -- assuming device removal migrates data and
resilvers to optimal stripe online..

-Wade
mike
2007-01-18 20:31:17 UTC
Permalink
Would this be the same as failing a drive on purpose to remove it?

I was under the impression that was supported, but I wasn't sure if
shrinking a ZFS pool would work though.
Post by Matthew Ahrens
This is a pretty high priority. We are working on it.
Wee Yeh Tan
2007-01-19 01:33:29 UTC
Permalink
Post by mike
Would this be the same as failing a drive on purpose to remove it?
I was under the impression that was supported, but I wasn't sure if
shrinking a ZFS pool would work though.
Not quite. I suspect you are thinking about drive replacement rather
than removal.

Drive replacement is already supported in ZFS and the task involves
rebuilding data on the disk from data available elsewhere. Drive
removal involves rebalancing data from the target drive to other
disks. The latter is non-trivial.
--
Just me,
Wire ...
mike
2007-01-19 02:01:36 UTC
Permalink
what is the technical difference between forcing a removal and an
actual failure?

isn't it the same process? except one is manually triggered? i would
assume the same resilvering process happens when a usable drive is put
back in...
Post by Wee Yeh Tan
Not quite. I suspect you are thinking about drive replacement rather
than removal.
Drive replacement is already supported in ZFS and the task involves
rebuilding data on the disk from data available elsewhere. Drive
removal involves rebalancing data from the target drive to other
disks. The latter is non-trivial.
--
Just me,
Wire ...
Erik Trimble
2007-01-19 02:29:01 UTC
Permalink
Mike,

I think you are missing the point. What we are talking about is
removing a drive from a zpool, that is, reducing the zpool's total
capacity by a drive. Say you have 4 drives of 100GB in size,
configured in a striped mirror, capacity of 200GB usable. We're
discussing the case where if the zpool's total used space is under
100GB, we could remove the second vdev (consisting of a mirror) from the
zpool, and have ZFS evacuate all the data from the to-be-removed vdev
before we actually remove the disks (or, maybe we simply want to
reconfigure them into another zpool). In this case, after the drive
remoovals, the zpool would be left with a 100GB capacity, and be a
simple 2-drive mirror.


What you are talking about is replacement of a drive, whether or not it
is actually bad. In your instance, the zpool capacity size remains the
same, and it will return to optimal performance when a new drive is
inserted (and, no, there is no difference between a manual and automatic
"removal" in the case of marking a drive bad for replacement).

-Erik
Post by mike
what is the technical difference between forcing a removal and an
actual failure?
isn't it the same process? except one is manually triggered? i would
assume the same resilvering process happens when a usable drive is put
back in...
Post by Wee Yeh Tan
Not quite. I suspect you are thinking about drive replacement rather
than removal.
Drive replacement is already supported in ZFS and the task involves
rebuilding data on the disk from data available elsewhere. Drive
removal involves rebalancing data from the target drive to other
disks. The latter is non-trivial.
--
Just me,
Wire ...
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
--
Erik Trimble
Java System Support
Mailstop: usca14-102
Phone: x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)
mike
2007-01-19 03:07:31 UTC
Permalink
I get that part. I think I asked that question before (although not as
direct) - basically you're talking about the ability to shrink volumes
and/or disable/change the mirroring/redundancy options if there is
space available to account for it.

If this was allowed, this would also allow for a conversion from RAIDZ
to RAIDZ2, or vice-versa then, correct?
Post by Erik Trimble
Mike,
I think you are missing the point. What we are talking about is
removing a drive from a zpool, that is, reducing the zpool's total
capacity by a drive. Say you have 4 drives of 100GB in size,
configured in a striped mirror, capacity of 200GB usable. We're
discussing the case where if the zpool's total used space is under
100GB, we could remove the second vdev (consisting of a mirror) from the
zpool, and have ZFS evacuate all the data from the to-be-removed vdev
before we actually remove the disks (or, maybe we simply want to
reconfigure them into another zpool). In this case, after the drive
remoovals, the zpool would be left with a 100GB capacity, and be a
simple 2-drive mirror.
What you are talking about is replacement of a drive, whether or not it
is actually bad. In your instance, the zpool capacity size remains the
same, and it will return to optimal performance when a new drive is
inserted (and, no, there is no difference between a manual and automatic
"removal" in the case of marking a drive bad for replacement).
-Erik
Robert Milkowski
2007-01-19 08:28:30 UTC
Permalink
Hello mike,

Friday, January 19, 2007, 4:07:31 AM, you wrote:

m> I get that part. I think I asked that question before (although not as
m> direct) - basically you're talking about the ability to shrink volumes
m> and/or disable/change the mirroring/redundancy options if there is
m> space available to account for it.

You can already convert RAID-10 to RAID-0 and vice versa with ZFS.
Just attach/detach disks.

m> If this was allowed, this would also allow for a conversion from RAIDZ
m> to RAIDZ2, or vice-versa then, correct?

Not really - at least not directly.
--
Best regards,
Robert mailto:***@task.gda.pl
http://milek.blogspot.com
Celso
2007-01-18 20:33:00 UTC
Permalink
Both removing disks from a zpool and modifying raidz arrays would be very useful.

I would also still love to have ditto data blocks. Is there any progress on this?

Celso.


This message posted from opensolaris.org
Shannon Roddy
2007-01-18 21:22:13 UTC
Permalink
Post by Celso
Both removing disks from a zpool and modifying raidz arrays would be very useful.
Add my vote for this.
Martin
2007-01-19 03:06:47 UTC
Permalink
Post by Jeremy Teo
Post by Jeremy Teo
On the issue of the ability to remove a device from
a zpool, how
Post by Jeremy Teo
useful/pressing is this feature? Or is this more
along the line of
Post by Jeremy Teo
"nice to have"?
This is a pretty high priority. We are working on
it.
Good news! Where is the discussion on the best approach to take?
Post by Jeremy Teo
The most common reason is migration of data to new
storage
infrastructure. The experience is often that the
growth in disk size
allows the new storage to consist of fewer disks/LUNs
than the old.
I agree completely. No matter how wonderful your current FC/SAS/whatever cabinet is, at some point in the future you will want to migrate to another newer/faster array with a better/faster interface, probably on fewer disks. The "just add another top level vdev" approach to growing a RAIDZ pool seems a bit myopic.
Post by Jeremy Teo
On Thu, 2007-01-18 at 10:51 -0800, Matthew Ahrens
I'd consider it a lower priority than say, adding a
drive to a RAIDZ
vdev, but yes, being able to reduce a zpool's size by
removing devices
is quite useful, as it adds a considerable degree of
flexibility that
(we) admins crave.
These two items (removing a vdev and restriping an array) are probably closely related. At the core of either operation likely will center around some metaslab_evacuate() routine which empties a metaslab and puts the data onto another metaslab.

Evacuating a vdev could be no more than evacuating all of the metaslabs in the vdev.

Restriping (adding/removing a data/parity disk) could be no more than progressively evacuating metaslabs with the old stripe geometry and writing the data to metaslabs with the new stripe geometry. The biggest challenge while restriping might be getting the read routine to figure out on-the-fly which geometry is in use for any particular stripe. Even so, this shouldn't be too big of a challenge: one geometry will checksum correctly and the other will not.

Marty


This message posted from opensolaris.org
Darren Dunham
2007-01-19 21:57:19 UTC
Permalink
Post by Martin
These two items (removing a vdev and restriping an array) are probably
closely related. At the core of either operation likely will center
around some metaslab_evacuate() routine which empties a metaslab and
puts the data onto another metaslab.
Evacuating a vdev could be no more than evacuating all of the
metaslabs in the vdev.
How would snapshots behave here? Would they prevent evacuation, or
would they be able to refer to a migrated block?
--
Darren Dunham ***@taos.com
Senior Technical Consultant TAOS http://www.taos.com/
Got some Dr Pepper? San Francisco, CA bay area
< This line left intentionally blank to confuse you. >
Robert Milkowski
2007-01-19 08:31:02 UTC
Permalink
Hello Matthew,
Post by Jeremy Teo
On the issue of the ability to remove a device from a zpool, how
useful/pressing is this feature? Or is this more along the line of
"nice to have"?
MA> This is a pretty high priority. We are working on it.

Quick, precise and "informative".

Ok, can you give us any details? Like only removal or also adding a
disk and re-writing all data to much new stripe width? Also conversion
Z1<->Z2? Other also (10->Z1, ...)? Any estimate when and what would
hit ON?
--
Best regards,
Robert mailto:***@task.gda.pl
http://milek.blogspot.com
Ceri Davies
2007-01-19 10:42:04 UTC
Permalink
Post by Jeremy Teo
On the issue of the ability to remove a device from a zpool, how
useful/pressing is this feature? Or is this more along the line of
"nice to have"?
We definitely need it. As a usage case, on occasion we have had to move
SAN sites, and the easiest way to that by far is to snap on the new site
and remove the old one once it's synced.

Ceri
--
That must be wonderful! I don't understand it at all.
-- Moliere
Rainer Heilke
2007-01-19 22:27:03 UTC
Permalink
If you are referring to shrinking a pool/file system, where I work this is considered very high on the list. It isn't a truly dynamic file system if we can't shrink it.

As a practical example, you have a test server with several projects being worked on. When a project finishes 9for whatever reason), you delete the files, shrink the pool, and give the LUN back to the storage folks to assign to another server that may be running out of space.

Having a SAN, we see it as more important than, say, the RAIDZ work. But, I can see why other people with different needs will argue otherwise. There are other things I would like to see as well (ability to find what files got corrupted due to a HW failure, and so on--see my other threads in this forum), but from the enterprise perspective of the company I work for, this is right up there. Just throwing our $.02 in. :-)

Rainer


This message posted from opensolaris.org
mario heimel
2007-01-20 09:49:33 UTC
Permalink
we migrate in our solaris8+vxvm+SAN environment 500tb to new storage arrays.

we have seen a lot of migration ways, falconstore etc., but the only acceptable way is the host based mirror with vxvm. so we can migrate manuelly in a few weeks but without downtime.

tell me how we can do this with zpools without the ability to remove luns from the pool. This is our view and when you only have thumbers this is not important.


This message posted from opensolaris.org
Rainer Heilke
2007-01-22 15:49:09 UTC
Permalink
Post by mario heimel
but the only acceptable way is the host based
mirror with vxvm. so we can migrate manuelly in a few
weeks but without downtime.
Detaching mirrors is actually easy with ZFS. I've done in several times. Look at:

zpool detach pool device

The problem here is that the detached side loses all information about the ZPool, the ZFS filoe system(s), etc. This really bit me in the butt when I was trying to figure out which of two HDD really failed (the failed disk worked "sort of").

The part that isn't there yet is shrinking, say, a 50% used 500GB pool down to a 300GB pool.

Rainer


This message posted from opensolaris.org
mario heimel
2007-01-22 20:28:50 UTC
Permalink
this is a good point, the mirror loses all information about the zpool.
this is very important for the ZFS Root pool, i don't know how often i have broken the svm-mirror of the root disks, to clone a system and bring the disk to a other system or use "live upgrade" and so on.


This message posted from opensolaris.org
Darren J Moffat
2007-01-23 11:32:18 UTC
Permalink
Post by mario heimel
this is a good point, the mirror loses all information about the zpool.
this is very important for the ZFS Root pool, i don't know how often i have broken the svm-mirror of the root disks, to clone a system and bring the disk to a other system or use "live upgrade" and so on.
For the live upgrade case you don't need to break the mirror for cloning
with ZFS root. Live upgrade will (probably) just run zfs clone and you
would boot from that (regardless of the presence or not of a mirror).

For the "clone another system" zfs send/recv might be useful
--
Darren J Moffat
Mike Gerdts
2007-01-23 11:37:20 UTC
Permalink
Post by Darren J Moffat
For the "clone another system" zfs send/recv might be useful
Having support for this directly in flarcreate would be nice. It
would make differential flars very quick and efficient.

Mike
--
Mike Gerdts
http://mgerdts.blogspot.com/
Rainer Heilke
2007-01-23 17:42:50 UTC
Permalink
Post by Darren J Moffat
For the "clone another system" zfs send/recv might be
useful
Keeping in mind that you only want to send/recv one half of the ZFS mirror...

Rainer


This message posted from opensolaris.org
Darren J Moffat
2007-01-24 09:59:50 UTC
Permalink
Post by Rainer Heilke
Post by Darren J Moffat
For the "clone another system" zfs send/recv might be
useful
Keeping in mind that you only want to send/recv one half of the ZFS mirror...
Huh ?

That doesn't make any sense. You can't send half a mirror. When you
are running zfs send it is a "read" and ZFS will read the data from all
available mirrors to help performance. When it is zfs recv it will
write to all sides of the mirror on the destination.

What are you actually trying to say here ?
--
Darren J Moffat
SteveW
2007-01-25 15:54:50 UTC
Permalink
The ability to shrink a pool by removing devices is the only reason my enterprise is not yet using ZFS, simply because it prevents us from easily migrating storage.


This message posted from opensolaris.org
Al Hopper
2007-01-25 17:35:54 UTC
Permalink
On Thu, 25 Jan 2007, SteveW wrote:

... reformatted ...
Post by SteveW
The ability to shrink a pool by removing devices is the only reason my
enterprise is not yet using ZFS, simply because it prevents us from
easily migrating storage.
That logic is totally bogus AFAIC. There are so many advantages to
running ZFS that denying yourself that opportunity is very short sighted -
especially when there are lots of ways of working around this minor
feature deficiency.

Perhaps if you post some specifics of your environment and usage scenario,
we can suggest workarounds.

Regards,

Al Hopper Logical Approach Inc, Plano, TX. ***@logical-approach.com
Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
OpenSolaris Governing Board (OGB) Member - Feb 2006
Brian Hechinger
2007-01-25 17:51:27 UTC
Permalink
Post by Al Hopper
... reformatted ...
Post by SteveW
The ability to shrink a pool by removing devices is the only reason my
enterprise is not yet using ZFS, simply because it prevents us from
easily migrating storage.
That logic is totally bogus AFAIC. There are so many advantages to
running ZFS that denying yourself that opportunity is very short sighted -
especially when there are lots of ways of working around this minor
feature deficiency.
The other point is, how many other volume management systems allow you to remove
disks? I bet if the answer is not zero, it's not large. ;)

-brian
--
"The reason I don't use Gnome: every single other window manager I know of is
very powerfully extensible, where you can switch actions to different mouse
buttons. Guess which one is not, because it might confuse the poor users?
Here's a hint: it's not the small and fast one." --Linus
Darren Dunham
2007-01-25 18:51:30 UTC
Permalink
Post by Brian Hechinger
The other point is, how many other volume management systems allow you
to remove disks? I bet if the answer is not zero, it's not large. ;)
As far as Solaris is concerned, I'm only aware of two significant such
systems. SVM and VxVM.

SVM doesn't really manage disks per-se. So there's really nothing in it
that disallows removing them or reusing them. Of course it offers no
help in migrating data off of any such disks.

VxVM does have tools to migrate data from a disk, and to either remove a
disk from a pool, or even migrate data on a disk into another pool. In
many cases, enterprise customers have this existing functionality in
mind when considering ZFS.

As a third point, follow the Network Appliance list a bit and you'll see
that the question comes up with their systems fairly often. It's not
uncommon for someone to accidentally add a spare disk to a volume or
aggregate. Said disk cannot be retreived without destroying and
recreating (a blank) volume or aggregate, followed by a restore.

See also ZFS automatically increasing the size of a pool to take up all
space, even if I don't want it to...
--
Darren Dunham ***@taos.com
Senior Technical Consultant TAOS http://www.taos.com/
Got some Dr Pepper? San Francisco, CA bay area
< This line left intentionally blank to confuse you. >
SteveW
2007-01-25 20:15:23 UTC
Permalink
Post by Darren Dunham
Post by Brian Hechinger
The other point is, how many other volume management systems allow you
to remove disks? I bet if the answer is not zero, it's not large. ;)
As far as Solaris is concerned, I'm only aware of two significant such
systems. SVM and VxVM.
Correct, in our case we've been using VxVM for years.
We're comfortable with it, it does what we want, vxfs performance is outstanding, and we move storage around [u]A LOT[/u] as our SAN storage changes. So for the time being, the benefits won't allow us to give up the ability to shrink/remove on the fly.
I should add that I am very excited about ZFS, and hope to be able to use it [i]everywhere[/i] as soon as possible.


This message posted from opensolaris.org
Dick Davies
2007-01-26 12:24:34 UTC
Permalink
Post by Brian Hechinger
The other point is, how many other volume management systems allow you to remove
disks? I bet if the answer is not zero, it's not large. ;)
Even Linux LVM can do this (with pvmove) - slow, but you can do it online.
--
Rasputin :: Jack of All Trades - Master of Nuns
http://number9.hellooperator.net/
Valery Fouques
2007-02-21 15:34:07 UTC
Permalink
Post by SteveW
The ability to shrink a pool by removing devices is the only reason my
enterprise is not yet using ZFS, simply because it prevents us from
easily migrating storage.
That logic is totally bogus AFAIC. There are so many advantages to
running ZFS that denying yourself that opportunity is very short sighted -
especially when there are lots of ways of working around this minor
feature deficiency.
I cannot let you say that.
Here in my company we are very interested in ZFS, but we do not care about the RAID/mirror features, because we already have a SAN with RAID-5 disks, and dual fabric connection to the hosts.

We would have migrated already if we could simply migrate data from a storage array to another (which we do more often than you might think).

Currently we use (and pay for) VXVM, here is how we do a migration:
1/ Allocate disks from the new array, visible by the host.
2/ Add the disks in the diskgroup.
3/ Run vxevac to evacuate data from "old" disks.
4/ Remove old disks from the DG.

If you explain how to do that with ZFS, no downtime, and new disks with different capacities, you're my hero ;-)


This message posted from opensolaris.org
Rich Teer
2007-02-21 15:37:51 UTC
Permalink
Post by Valery Fouques
Here in my company we are very interested in ZFS, but we do not care
about the RAID/mirror features, because we already have a SAN with
RAID-5 disks, and dual fabric connection to the hosts.
... And presumably you've read the threads where ZFS has helped find
(and repair) corruption in such setups?

(But yeah, I agree the ability to shrink a pool is important.)
--
Rich Teer, SCSA, SCNA, SCSECA, OpenSolaris CAB member

President,
Rite Online Inc.

Voice: +1 (250) 979-1638
URL: http://www.rite-group.com/rich
C***@Sun.COM
2007-02-21 15:43:34 UTC
Permalink
Post by Valery Fouques
I cannot let you say that.
Here in my company we are very interested in ZFS, but we do not care
about the RAID/mirror features, because we already have a SAN with
RAID-5 disks, and dual fabric connection to the hosts.
But you understand that these underlying RAID mechanism give absolutely
no guarantee about data integrity but only that some data was found were
some (possibly other) data was written? (RAID5 never verifies the
checkum is correct on reads; it only uses it to reconstruct data when
reads fail)

Casper
Frank Cusack
2007-02-21 18:31:10 UTC
Permalink
Post by C***@Sun.COM
Post by Valery Fouques
I cannot let you say that.
Here in my company we are very interested in ZFS, but we do not care
about the RAID/mirror features, because we already have a SAN with
RAID-5 disks, and dual fabric connection to the hosts.
But you understand that these underlying RAID mechanism give absolutely
no guarantee about data integrity but only that some data was found were
some (possibly other) data was written? (RAID5 never verifies the
checkum is correct on reads; it only uses it to reconstruct data when
reads fail)
um, I thought smarter arrays did that these days. Of course it's not
end-to-end so the parity verification isn't as useful as it should be;
gigo.

-frank
C***@Sun.COM
2007-02-21 18:59:14 UTC
Permalink
Post by Frank Cusack
Post by C***@Sun.COM
Post by Valery Fouques
I cannot let you say that.
Here in my company we are very interested in ZFS, but we do not care
about the RAID/mirror features, because we already have a SAN with
RAID-5 disks, and dual fabric connection to the hosts.
But you understand that these underlying RAID mechanism give absolutely
no guarantee about data integrity but only that some data was found were
some (possibly other) data was written? (RAID5 never verifies the
checkum is correct on reads; it only uses it to reconstruct data when
reads fail)
um, I thought smarter arrays did that these days. Of course it's not
end-to-end so the parity verification isn't as useful as it should be;
gigo.
Generate extra I/O and verify parity, is that not something that may
be a problem in performance benchmarking?

For mirroring, a similar problem exists, of course. ZFS reads from the
right side of the mirror and corrects the wrong side if it finds an
error. RAIDs do not.

Casper
p***@poczta.fm
2007-02-22 08:37:46 UTC
Permalink
Post by C***@Sun.COM
Post by Valery Fouques
I cannot let you say that.
Here in my company we are very interested in ZFS, but we do not care
about the RAID/mirror features, because we already have a SAN with
RAID-5 disks, and dual fabric connection to the hosts.
But you understand that these underlying RAID mechanism give absolutely
no guarantee about data integrity but only that some data was found were
some (possibly other) data was written? (RAID5 never verifies the
checkum is correct on reads; it only uses it to reconstruct data when
reads fail)
But you understand that he perhaps knows that but so far nothing wrong
happened [*] and migration is still very important feature for him ?

[*] almost every big company has its data center with SAN and FC
connections with RAID-5 or RAID-10 in their storage arrays
and they are treated as reliable

przemol

----------------------------------------------------------------------
Wpadka w kosciele - zobacz >> http://link.interia.pl/f19ea
Jason J. W. Williams
2007-02-22 19:21:50 UTC
Permalink
Hi Przemol,

I think Casper had a good point bringing up the data integrity
features when using ZFS for RAID. Big companies do a lot of things
"just because that's the certified way" that end up biting them in the
rear. Trusting your SAN arrays is one of them. That all being said,
the need to do migrations is a very valid concern.

Best Regards,
Jason
Post by p***@poczta.fm
Post by C***@Sun.COM
Post by Valery Fouques
I cannot let you say that.
Here in my company we are very interested in ZFS, but we do not care
about the RAID/mirror features, because we already have a SAN with
RAID-5 disks, and dual fabric connection to the hosts.
But you understand that these underlying RAID mechanism give absolutely
no guarantee about data integrity but only that some data was found were
some (possibly other) data was written? (RAID5 never verifies the
checkum is correct on reads; it only uses it to reconstruct data when
reads fail)
But you understand that he perhaps knows that but so far nothing wrong
happened [*] and migration is still very important feature for him ?
[*] almost every big company has its data center with SAN and FC
connections with RAID-5 or RAID-10 in their storage arrays
and they are treated as reliable
przemol
----------------------------------------------------------------------
Wpadka w kosciele - zobacz >> http://link.interia.pl/f19ea
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
p***@poczta.fm
2007-02-27 09:21:30 UTC
Permalink
Post by Jason J. W. Williams
Hi Przemol,
I think Casper had a good point bringing up the data integrity
features when using ZFS for RAID. Big companies do a lot of things
"just because that's the certified way" that end up biting them in the
rear. Trusting your SAN arrays is one of them. That all being said,
the need to do migrations is a very valid concern.
Jason,

I don't claim that SAN/RAID solutions are the best and don't have any
mistakes/failures/problems. But if SAN/RAID is so bad why companies
using them survive ?

Imagine also that some company is using SAN/RAID for a few years
and doesn't have any problems (or once a few months). Also from time to
time they need to migrate between arrays (for whatever reason). Now you come and say
that they have unreliable SAN/RAID and you offer something new (ZFS)
which is going to make it much more reliable but migration to another array
will be painfull. What do you think what they choose ?

BTW: I am a fan of ZFS. :-)

przemol

----------------------------------------------------------------------
Ustawiaj rekordy DNS dla swojej domeny >>
http://link.interia.pl/f1a1a
Shawn Walker
2007-02-27 09:29:04 UTC
Permalink
Post by p***@poczta.fm
Post by Jason J. W. Williams
Hi Przemol,
I think Casper had a good point bringing up the data integrity
features when using ZFS for RAID. Big companies do a lot of things
"just because that's the certified way" that end up biting them in the
rear. Trusting your SAN arrays is one of them. That all being said,
the need to do migrations is a very valid concern.
Jason,
I don't claim that SAN/RAID solutions are the best and don't have any
mistakes/failures/problems. But if SAN/RAID is so bad why companies
using them survive ?
I think he was trying to say that people that believe that those
solutions are reliable just because they are based on SAN/RAID
technology and are not aware of the true situation surrounding them.
--
"Less is only more where more is no good." --Frank Lloyd Wright

Shawn Walker, Software and Systems Analyst
***@gmail.com - http://binarycrusader.blogspot.com/
p***@poczta.fm
2007-02-27 10:28:59 UTC
Permalink
Post by Shawn Walker
Post by p***@poczta.fm
Post by Jason J. W. Williams
Hi Przemol,
I think Casper had a good point bringing up the data integrity
features when using ZFS for RAID. Big companies do a lot of things
"just because that's the certified way" that end up biting them in the
rear. Trusting your SAN arrays is one of them. That all being said,
the need to do migrations is a very valid concern.
Jason,
I don't claim that SAN/RAID solutions are the best and don't have any
mistakes/failures/problems. But if SAN/RAID is so bad why companies
using them survive ?
I think he was trying to say that people that believe that those
solutions are reliable just because they are based on SAN/RAID
technology and are not aware of the true situation surrounding them.
Is the "true situation" really so bad ?

My feeling was that he was trying to say that there is no SAN/RAID
solution without data integrity problem. Is it really true ?
Does anybody have any paper (*) about percentage of problems in SAN/RAID
because of data integrity ? Is it 5 % ? Or 30 % ? Or maybe 60 % ?

(*) Maybe such paper/report should be a start point for our discussion.

przemol

----------------------------------------------------------------------
Gdy nie ma dzieci... - zobacz >> http://link.interia.pl/f19eb
Robert Milkowski
2007-02-27 12:01:06 UTC
Permalink
Hello przemolicc,
Post by Shawn Walker
Post by p***@poczta.fm
Post by Jason J. W. Williams
Hi Przemol,
I think Casper had a good point bringing up the data integrity
features when using ZFS for RAID. Big companies do a lot of things
"just because that's the certified way" that end up biting them in the
rear. Trusting your SAN arrays is one of them. That all being said,
the need to do migrations is a very valid concern.
Jason,
I don't claim that SAN/RAID solutions are the best and don't have any
mistakes/failures/problems. But if SAN/RAID is so bad why companies
using them survive ?
I think he was trying to say that people that believe that those
solutions are reliable just because they are based on SAN/RAID
technology and are not aware of the true situation surrounding them.
ppf> Is the "true situation" really so bad ?

ppf> My feeling was that he was trying to say that there is no SAN/RAID
ppf> solution without data integrity problem. Is it really true ?
ppf> Does anybody have any paper (*) about percentage of problems in SAN/RAID
ppf> because of data integrity ? Is it 5 % ? Or 30 % ? Or maybe 60 % ?

ppf> (*) Maybe such paper/report should be a start point for our discussion.

See http://sunsolve.sun.com/search/document.do?assetkey=1-26-102815-1

as one example. This is entry level array but still such things
happens. I do also observer similar problems with IBM's array
(larger).

It's just that people are used to fsck from time to time not really
knowing why and in many cases they do not realize that their data is
not exactly what they expect it to be.

However from my experience I must admit the problem is almost only
seen with SATA drives.

I had a problem with SCSI adapter which was
sending some warnings (driver) but still was passing IOs - it turned
out data were corrupted. Changing SCSI adapter solved the problem.
The point is that thanks to ZFS we caught the problem, replaced
bad card, did zpool scrub and everything was in perfect shape. No need
to resynchronize data, etc.

Another time I had a problem with FC array and lost some data but
there's no ZFS on it :(((

On all other arrays, jbods, etc. with SCSI and/or FC disks I haven't
seen (yet) checksum errors reported by ZFS.
--
Best regards,
Robert mailto:***@task.gda.pl
http://milek.blogspot.com
Erik Trimble
2007-02-27 16:47:42 UTC
Permalink
<huge forwards on how bad SANs really are for data integrity removed>


The answer is: insufficient data.


With modern journalling filesystems, I've never had to fsck anything or
run a filesystem repair. Ever. On any of my SAN stuff.

The sole place I've run into filesystem corruption in the traditional
sense is with faulty hardware controllers; and, I'm not even sure ZFS
could recover from those situations, though less dire ones where the
controllers are merely emitting slightly wonky problems certainly would
be within ZFS's ability to fix, vice the inability of a SAN to determine
that the data was bad.


That said, the primary issue here is that nobody really has any idea
about silent corruption - that is, blocks which change value, but are
data, not filesystem-relevant. Bit flips and all. Realistically, the
only way previous to ZFS to detect this was to do bit-wise comparisons
against backups, which becomes practically impossible on an active data
set.

SAN/RAID equipment still has a very considerable place over JBODs in
most large-scale places, particularly in areas of configuration
flexibility, security, and management. That said, I think we're arguing
at cross-purposes: the real solution for most enterprise customers is
SAN + ZFS, not either just by itself.
--
Erik Trimble
Java System Support
Mailstop: usca14-102
Phone: x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)
Robert Milkowski
2007-02-27 23:40:18 UTC
Permalink
Hello Erik,

Tuesday, February 27, 2007, 5:47:42 PM, you wrote:

ET> <huge forwards on how bad SANs really are for data integrity removed>


ET> The answer is: insufficient data.


ET> With modern journalling filesystems, I've never had to fsck anything or
ET> run a filesystem repair. Ever. On any of my SAN stuff.

I'm not sure if you consider UFS in S10 as a modern journalling
filesystem but in case you do:

Feb 13 12:03:16 XXXX ufs: [ID 879645 kern.notice] NOTICE: /opt/d1635: unexpected free inode 54305084, run fsck(1M) -o f

This file system is on a medium large array (IBM) in a SAN
environment.
--
Best regards,
Robert mailto:***@task.gda.pl
http://milek.blogspot.com
Erik Trimble
2007-02-27 23:55:24 UTC
Permalink
Honestly, no, I don't consider UFS a modern file system. :-)

It's just not in the same class as JFS for AIX, xfs for IRIX, or even
VxFS.

-Erik
Post by Robert Milkowski
Hello Erik,
ET> <huge forwards on how bad SANs really are for data integrity removed>
ET> The answer is: insufficient data.
ET> With modern journalling filesystems, I've never had to fsck anything or
ET> run a filesystem repair. Ever. On any of my SAN stuff.
I'm not sure if you consider UFS in S10 as a modern journalling
Feb 13 12:03:16 XXXX ufs: [ID 879645 kern.notice] NOTICE: /opt/d1635: unexpected free inode 54305084, run fsck(1M) -o f
This file system is on a medium large array (IBM) in a SAN
environment.
--
Erik Trimble
Java System Support
Mailstop: usca14-102
Phone: x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)
Robert Milkowski
2007-02-28 09:17:26 UTC
Permalink
Hello Erik,

Wednesday, February 28, 2007, 12:55:24 AM, you wrote:

ET> Honestly, no, I don't consider UFS a modern file system. :-)

ET> It's just not in the same class as JFS for AIX, xfs for IRIX, or even
ET> VxFS.

The point is that fsck was due to an array corrupting data.
IMHO it would hit JFS, XFS or VxFS as bad as UFS if not worse.
--
Best regards,
Robert mailto:***@task.gda.pl
http://milek.blogspot.com
Rob Logan
2007-02-28 01:35:43 UTC
Permalink
Post by Erik Trimble
With modern journalling filesystems, I've never had to fsck anything or
run a filesystem repair. Ever. On any of my SAN stuff.
you will.. even if the SAN is perfect, you will hit
bugs in the filesystem code.. from lots of rsync hard
links or like this one from raidtools last week:

Feb 9 05:38:39 orbit kernel: mptbase: ioc2: IOCStatus(0x0043): SCSI Device Not There
Feb 9 05:38:39 orbit kernel: md: write_disk_sb failed for device sdp1
Feb 9 05:38:39 orbit kernel: md: errors occurred during superblock update, repeating

Feb 9 05:39:01 orbit kernel: raid6: Disk failure on sdp1, disabling device. Operation continuing on 13 devices
Feb 9 05:39:09 orbit kernel: mptscsi: ioc2: attempting task abort! (sc=cb17c800)
Feb 9 05:39:10 orbit kernel: RAID6 conf printout:
Feb 9 05:39:10 orbit kernel: --- rd:14 wd:13 fd:1

Feb 9 05:44:37 orbit kernel: EXT3-fs error (device dm-0): ext3_readdir: bad entry in directory #10484: rec_len %$
Feb 9 05:44:37 orbit kernel: Aborting journal on device dm-0.
Feb 9 05:44:37 orbit kernel: ext3_abort called.
Feb 9 05:44:37 orbit kernel: EXT3-fs error (device dm-0): ext3_journal_start_sb: Detected aborted journal
Feb 9 05:44:37 orbit kernel: Remounting filesystem read-only
Feb 9 05:44:37 orbit kernel: attempt to access beyond end of device
Feb 9 05:44:44 orbit kernel: oom-killer: gfp_mask=0xd0
<death and crupt fs>
Richard Elling
2007-02-27 18:11:13 UTC
Permalink
Post by p***@poczta.fm
Is the "true situation" really so bad ?
The failure mode is silent error. By definition, it is hard to
count silent errors. What ZFS does is improve the detection of
silent errors by a rather considerable margin. So, what we are
seeing is that suddenly people are seeing errors that they didn't
see before (or do you "hear" silent errors? ;-). That has been
surprising and leads some of us to recommend ZFS no matter what
your storage looks like, even if silent error detection is the
only benefit.
-- richard
Jason J. W. Williams
2007-02-27 23:51:57 UTC
Permalink
Hi Przemol,

I think migration is a really important feature...think I said that...
;-) SAN/RAID is not awful...frankly there's not been better solution
(outside of NetApp's WAFL) till ZFS. SAN/RAID just has its own
reliability issues you accept unless you don't have to....ZFS :-)

-J
Post by p***@poczta.fm
Post by Jason J. W. Williams
Hi Przemol,
I think Casper had a good point bringing up the data integrity
features when using ZFS for RAID. Big companies do a lot of things
"just because that's the certified way" that end up biting them in the
rear. Trusting your SAN arrays is one of them. That all being said,
the need to do migrations is a very valid concern.
Jason,
I don't claim that SAN/RAID solutions are the best and don't have any
mistakes/failures/problems. But if SAN/RAID is so bad why companies
using them survive ?
Imagine also that some company is using SAN/RAID for a few years
and doesn't have any problems (or once a few months). Also from time to
time they need to migrate between arrays (for whatever reason). Now you come and say
that they have unreliable SAN/RAID and you offer something new (ZFS)
which is going to make it much more reliable but migration to another array
will be painfull. What do you think what they choose ?
BTW: I am a fan of ZFS. :-)
przemol
----------------------------------------------------------------------
Ustawiaj rekordy DNS dla swojej domeny >>
http://link.interia.pl/f1a1a
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Richard Elling
2007-02-21 18:55:43 UTC
Permalink
Post by Valery Fouques
Post by SteveW
The ability to shrink a pool by removing devices is the only reason my
enterprise is not yet using ZFS, simply because it prevents us from
easily migrating storage.
That logic is totally bogus AFAIC. There are so many advantages to
running ZFS that denying yourself that opportunity is very short sighted -
especially when there are lots of ways of working around this minor
feature deficiency.
I cannot let you say that.
Here in my company we are very interested in ZFS, but we do not care about the RAID/mirror features, because we already have a SAN with RAID-5 disks, and dual fabric connection to the hosts.
We would have migrated already if we could simply migrate data from a storage array to another (which we do more often than you might think).
But you describe VxVM feature, not a file system feature.
Post by Valery Fouques
1/ Allocate disks from the new array, visible by the host.
2/ Add the disks in the diskgroup.
3/ Run vxevac to evacuate data from "old" disks.
4/ Remove old disks from the DG.
If you explain how to do that with ZFS, no downtime, and new disks with different capacities, you're my hero ;-)
zpool replace old-disk new-disk
The caveat is that new-disk must be as big or bigger than old-disk.
This caveat is the core of the shrink "problem"
-- richard
Frank Cusack
2007-02-21 19:50:08 UTC
Permalink
On February 21, 2007 10:55:43 AM -0800 Richard Elling
Post by Richard Elling
Post by Valery Fouques
Post by SteveW
The ability to shrink a pool by removing devices is the only reason my
enterprise is not yet using ZFS, simply because it prevents us from
easily migrating storage.
That logic is totally bogus AFAIC. There are so many advantages to
running ZFS that denying yourself that opportunity is very short
sighted - especially when there are lots of ways of working around this
minor feature deficiency.
I cannot let you say that.
Here in my company we are very interested in ZFS, but we do not care
about the RAID/mirror features, because we already have a SAN with
RAID-5 disks, and dual fabric connection to the hosts.
We would have migrated already if we could simply migrate data from a
storage array to another (which we do more often than you might think).
But you describe VxVM feature, not a file system feature.
But in the context of zfs, this is appropriate.

-frank
Peter Schuller
2007-01-25 18:22:54 UTC
Permalink
Post by SteveW
The ability to shrink a pool by removing devices is the only reason my
enterprise is not yet using ZFS, simply because it prevents us from easily
migrating storage.
Being able to do this would be very high on my wishlist from the perspective
of a home user.

But also from the perspective of more serious user (though I am not involved
in using ZFS in such a case - not yet anyway...) it is most definitely a very
nice thing to be able to do, in various stituations.

Example of things I would love to be able to do, which I would with such a
feature:

* Easily convert between mirror/striping/raidz/raidz2 (no need to purchase
twice the capacity for temporary storage during a conversion).

* Easily move storage between physical machines as needs change (assuming a
situation where you want drives locally attached to the machines in question,
and iSCSI and similar is not an option).

* Revert stupid misstake: accidentally adding something to a pool that should
not be there :)

* Easily - even live under the right circumstances - temporarily evacuate a
disk in order to e.g. perform drive testing if suspicious behavior is present
without a known cause.

* If a drive starts going bad and I do not have a spare readily available
(typical home use situation), I may want to evacuate the semi-broken drive so
that I do not loose redundancy until I can get another disk. May or may not
be practical depending on current disk space usage of course.

* Some machine A needs a spare drive but there is none, and I have free disk
space ond disk B and B has matching drives. Evacuate a disk on B and use as
replacement in A (again, typical home use situation). Once I obtain a new
drive revert B's disk into B again, or alternatively keep it in A and use the
new drive in B.
--
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <***@infidyne.com>'
Key retrieval: Send an E-Mail to ***@scode.org
E-Mail: ***@infidyne.com Web: http://www.scode.org
Constantin Gonzalez Schmitz
2007-01-26 09:00:13 UTC
Permalink
Hi,

I do agree that zpool remove is a _very_ desirable feature, not doubt about
that.

Here are a couple of thoughts and workarounds, in random order, that might
give us some more perspective:

- My home machine has 4 disks and a big zpool across them. Fine. But what
if a controller fails or worse, a CPU? Right, I need a second machine, if
I'm really honest with myself and serious with my data. Don't laugh, ZFS
on a Solaris server is becoming my mission-critical home storage solution
that is supposed to last beyond CDs and DVDs and other vulnerable media.

So, if I was an enterprise, I'd be willing to keep enough empty LUNs
available to facilitate at least the migration of one or more filesystems
if not complete pools. With a little bit of scripting, this can be done
quite easily and efficiently through zfs send/receive and some LUN
juggling.

If I was an enterprise's server admin and the storage guys wouldn't have
enough space for migrations, I'd be worried.

- We need to avoid customers thinking "Veritas can shrink, ZFS can't". That
is wrong. ZFS _filesystems_ grow and shrink all the time, it's just the
pools below them that can just grow. And Veritas does not even have pools.

People have started to follow a One-pool-to-store-them-all which I think
is not always appropriate. Some alternatives:

- One pool per zone might be a good idea if you want to migrate zones
across systems which then becomes easy through zpool export/import in
a SAN.

- One pool per service level (mirror, RAID-Z2, fast, slow, cheap, expensive)
might be another idea. Keep some cheap mirrored storage handy for your pool
migration needs and you could wiggle your life around zpool remove.

Switching between Mirror, RAID-Z, RAID-Z2 then becomes just a zfs
send/receive pair.

Shrinking a pool requires some more zfs send/receiving and maybe some
scripting, but these are IMHO less painful than living without ZFS'
data integrity and the other joys of ZFS.

Sorry if I'm stating the obvious or stuff that has been discussed before,
but the more I think about zpool remove, the more I think it's a question
of willingness to plan/work/script/provision vs. a real show stopper.

Best regards,
Constantin

P.S.: Now with my big mouth I hope I'll survive a customer confcall next
week with a customer asking for exactly zpool remove :).
--
Constantin Gonzalez Sun Microsystems GmbH, Germany
Platform Technology Group, Client Solutions http://www.sun.de/
Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/
Darren Dunham
2007-01-26 16:16:01 UTC
Permalink
Post by Constantin Gonzalez Schmitz
- We need to avoid customers thinking "Veritas can shrink, ZFS can't". That
is wrong. ZFS _filesystems_ grow and shrink all the time, it's just the
pools below them that can just grow. And Veritas does not even have pools.
I'm sure that this issue is different for different environments, but I
assure you it wasn't raised because we're looking at a spec chart and
someone saw a missing check in the ZFS column. The ability to
deallocate in-use storage without having to migrate the existing data is
used today by many administrators. We'll live with this not being
possible in ZFS at the moment, but the limitation is real and the
flexibility of filesystems within the pool doesn't alleviate it.
Post by Constantin Gonzalez Schmitz
Sorry if I'm stating the obvious or stuff that has been discussed before,
but the more I think about zpool remove, the more I think it's a question
of willingness to plan/work/script/provision vs. a real show stopper.
Show stopper would depend on the environment. It's certainly not that
in many places. I agree that if I could exactly plan all my storage
perfectly in advance, then several ways that it would be really useful
would be reduced. However one of the reasons to have it is precisely
because it is so difficult to get good predictions for storage use.

I know just a touch of the internals of ZFS to understand why
remove/split/evacuate is much more difficult than it might be in simpler
volume managers. I'm happy we've got what we have today and that people
have already thought up ways of attacking this problem to make ZFS even
better.
--
Darren Dunham ***@taos.com
Senior Technical Consultant TAOS http://www.taos.com/
Got some Dr Pepper? San Francisco, CA bay area
< This line left intentionally blank to confuse you. >
W***@fallon.com
2007-01-26 16:55:13 UTC
Permalink
Post by Constantin Gonzalez Schmitz
Hi,
I do agree that zpool remove is a _very_ desirable feature, not doubt about
that.
Here are a couple of thoughts and workarounds, in random order, that might
- My home machine has 4 disks and a big zpool across them. Fine. But what
if a controller fails or worse, a CPU? Right, I need a second machine, if
I'm really honest with myself and serious with my data. Don't laugh, ZFS
on a Solaris server is becoming my mission-critical home storage solution
that is supposed to last beyond CDs and DVDs and other vulnerable media.
So, if I was an enterprise, I'd be willing to keep enough empty LUNs
available to facilitate at least the migration of one or more filesystems
if not complete pools. With a little bit of scripting, this can be done
quite easily and efficiently through zfs send/receive and some LUN
juggling.
If I was an enterprise's server admin and the storage guys wouldn't have
enough space for migrations, I'd be worried.
I think you may find in practice that many medium to large enterprise IT
departments are in this exact situation -- we do not have luns sitting
stagnant just waiting for data migrations of our largest data sets. We
have been sold (and rightly so, because it works and is cost effective and
has no downtime) that we should be able to move luns around at will without
duplicating (to tape or disk) and dumping. You are really expecting to
have the storage guys to have 40tb of disk just sitting collecting dust
when you want to pull out 10 disks from a 44tb system? This type of
thinking may very well be why Sun has hard time in the last few years
(although zfs, and recent products show that the tide is turning).
Post by Constantin Gonzalez Schmitz
- We need to avoid customers thinking "Veritas can shrink, ZFS can't". That
is wrong. ZFS _filesystems_ grow and shrink all the time, it's just the
pools below them that can just grow. And Veritas does not even have pools.
Sorry, that is silly. Can we compare if we call them both "volumes or
filesystems (or any virtualization of each) which are reserved for data in
which we want to remove and add disks online"? vxfs can grow and shrink
and the volumes can grow and shrink. Pools may blur the line of volume/fs
but they are still delivering the same constraints to administrators trying
to admin these boxes and the disks attached to them.
Post by Constantin Gonzalez Schmitz
People have started to follow a One-pool-to-store-them-all which I think
- One pool per zone might be a good idea if you want to migrate zones
across systems which then becomes easy through zpool export/import in
a SAN.
- One pool per service level (mirror, RAID-Z2, fast, slow, cheap, expensive)
might be another idea. Keep some cheap mirrored storage handy for your pool
migration needs and you could wiggle your life around zpool remove.
You went from one pool to share data (the major advantage of the pool
concept) to a bunch of constrained pools. Also how does this resolve the
issue of lun migration online?
Post by Constantin Gonzalez Schmitz
Switching between Mirror, RAID-Z, RAID-Z2 then becomes just a zfs
send/receive pair.
Shrinking a pool requires some more zfs send/receiving and maybe some
scripting, but these are IMHO less painful than living without ZFS'
data integrity and the other joys of ZFS.
Ohh, never mind, dump to tape and restore (err disk) -- you do realize
that the industry has been selling products that have made this behavior
obsolete for close to 10 years now?
Post by Constantin Gonzalez Schmitz
Sorry if I'm stating the obvious or stuff that has been discussed before,
but the more I think about zpool remove, the more I think it's a question
of willingness to plan/work/script/provision vs. a real show stopper.
No, it is a specific workflow that requires disk to stay online, while
allowing for economically sound use of resources -- this is not about
laziness (that is how I am reading your view) or not wanting to script up
solutions.
Post by Constantin Gonzalez Schmitz
Best regards,
Constantin
P.S.: Now with my big mouth I hope I'll survive a customer confcall next
week with a customer asking for exactly zpool remove :).
I hope so, you may want to rethink the "script and go back in sysadmin
time 10 years" approach. ZFS buys alot and is a great filesystem but there
are places such as this that are still weak and need fixing for many
environments to be able to replace vxvm/vxfs or other solutions. Sure, you
will find people that are viewing this new pooled filesystem with old eyes,
but there are admins on this list that actually understand what they are
missing and the other options for working around these issues. We don't
look at this like a feature tickmark, but as a feature that we know is
missing that we really need to consider moving some of our systems from
vxvm/fs to zfs.


-Wade Stuart
Post by Constantin Gonzalez Schmitz
--
Constantin Gonzalez Sun Microsystems GmbH, Germany
Platform Technology Group, Client Solutions
http://www.sun.de/
Post by Constantin Gonzalez Schmitz
Tel.: +49 89/4 60 08-25 91
http://blogs.sun.com/constantin/
Post by Constantin Gonzalez Schmitz
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Rainer Heilke
2007-01-26 21:12:39 UTC
Permalink
Post by Constantin Gonzalez Schmitz
So, if I was an enterprise, I'd be willing to keep
enough empty LUNs
available to facilitate at least the migration of
one or more filesystems
if not complete pools.
You might be, but don't be surprised when the Financials folks laugh you out of their office. Large corporations do not make money by leaving wads of cash lying around, and that's exactly what a few terabytes of unused storage in a high-end SAN is. This is in addition to the laughter generated by the comment that, "not a big deal if the Financials and HR databases are offline for three days while we do the migration." Good luck writing up a business case that justifies this sort of fiscal generosity.

Sorry, this argument smacks a little too much of being out of touch with the fiscal (and time) restrictions of working in a typical corporation, as opposed to a well-funded research group.

I hope I'm not sounding rude, but those of us working in medium to large corporations simply do not have the money for such luxuries. Period.

Rainer


This message posted from opensolaris.org
Richard Elling
2007-01-26 21:53:01 UTC
Permalink
Post by Rainer Heilke
Post by Constantin Gonzalez Schmitz
So, if I was an enterprise, I'd be willing to keep
enough empty LUNs
available to facilitate at least the migration of
one or more filesystems
if not complete pools.
You might be, but don't be surprised when the Financials folks laugh you out of their office. Large corporations do not make money by leaving wads of cash lying around, and that's exactly what a few terabytes of unused storage in a high-end SAN is. This is in addition to the laughter generated by the comment that, "not a big deal if the Financials and HR databases are offline for three days while we do the migration." Good luck writing up a business case that justifies this sort of fiscal generosity.
To be fair, you can replace vdevs with same-sized or larger vdevs online.
The issue is that you cannot replace with smaller vdevs nor can you
eliminate vdevs. In other words, I can migrate data around without
downtime, I just can't shrink or eliminate vdevs without send/recv.
This is where the philosophical disconnect lies. Everytime we descend
into this rathole, we stir up more confusion :-(

If you consider your pool of storage as a zpool, then the management of
subparts of the pool is done at the file system level. This concept is
different than other combinations of devices and file systems such as
SVM+UFS. When answering the ZFS shrink question, you need to make sure
you're not applying the old concepts to the new model.

Personally, I've never been in the situation where users ask for less storage,
but maybe I'm just the odd guy out? ;-) Others have offered cases where
a shrink or vdev restructuring could be useful. But I still see some
confusion with file system management (including zvols) and device management.
The shrink feature is primarily at the device management level.
-- richard
Jason J. W. Williams
2007-01-27 00:07:09 UTC
Permalink
Post by Richard Elling
To be fair, you can replace vdevs with same-sized or larger vdevs online.
The issue is that you cannot replace with smaller vdevs nor can you
eliminate vdevs. In other words, I can migrate data around without
downtime, I just can't shrink or eliminate vdevs without send/recv.
This is where the philosophical disconnect lies. Everytime we descend
into this rathole, we stir up more confusion :-(
We did just this to move off RAID-5 LUNs that were the vdevs for a
pool, to RAID-10 LUNs. Worked very well, and as Richard said was done
all on-line. Doesn't really address the shrinking issue though. :-)

Best Regards,
Jason
Torrey McMahon
2007-01-27 01:34:32 UTC
Permalink
Post by Richard Elling
Personally, I've never been in the situation where users ask for less storage,
but maybe I'm just the odd guy out? ;-)
You just realized that JoeSysadmin allocated ten luns to the zpool when
he realy only should have allocated one.
Rainer Heilke
2007-01-27 04:23:29 UTC
Permalink
Post by Richard Elling
Post by Rainer Heilke
Post by Constantin Gonzalez Schmitz
So, if I was an enterprise, I'd be willing to keep
enough empty LUNs
available to facilitate at least the migration of
one or more filesystems
if not complete pools.
You might be, but don't be surprised when the Financials folks laugh you out of their office. Large corporations do not make money by leaving wads of cash lying around, and that's exactly what a few terabytes of unused storage in a high-end SAN is. This is in addition to the laughter generated by the comment that, "not a big deal if the Financials and HR databases are offline for three days while we do the migration." Good luck writing up a business case that justifies this sort of fiscal generosity.
To be fair, you can replace vdevs with same-sized or larger vdevs online.
The issue is that you cannot replace with smaller vdevs nor can you
eliminate vdevs. In other words, I can migrate data around without
downtime, I just can't shrink or eliminate vdevs without send/recv.
This is where the philosophical disconnect lies. Everytime we descend
into this rathole, we stir up more confusion :-(
If you consider your pool of storage as a zpool, then the management of
subparts of the pool is done at the file system level. This concept is
different than other combinations of devices and file systems such as
SVM+UFS. When answering the ZFS shrink question, you need to make sure
you're not applying the old concepts to the new model.
Personally, I've never been in the situation where users ask for less storage,
but maybe I'm just the odd guy out? ;-) Others have offered cases where
a shrink or vdev restructuring could be useful. But I still see some
confusion with file system management (including zvols) and device management.
The shrink feature is primarily at the device management level.
-- richard
I understand these arguments, and the differences (and that most users will never ask for less storage), but there are many instances where storage needs to move around, even between systems, and in that case, unless a whole zpool of storage is going, how do you do it? You need to give back two LUN's in a 6-LUN zpool. Oh, wait. You can't shrink a zpool.

Many people here are giving examples of where this capability is needed. We need to agree that different users' needs vary, and that there are real reasons for this.

Rainer


This message posted from opensolaris.org
Al Hopper
2007-01-26 23:58:55 UTC
Permalink
Post by Rainer Heilke
Post by Constantin Gonzalez Schmitz
So, if I was an enterprise, I'd be willing to keep
enough empty LUNs
available to facilitate at least the migration of
one or more filesystems
if not complete pools.
.... reformatted ...
Post by Rainer Heilke
You might be, but don't be surprised when the Financials folks laugh you
out of their office. Large corporations do not make money by leaving
wads of cash lying around, and that's exactly what a few terabytes of
unused storage in a high-end SAN is. This is in addition to the laughter
But this is exactly where ZFS distrupts "Large corporations" thinking.
You're talking about (for example) 2 terabytes on a high-end SAN which
costs (what ?) per GB (including the capital cost of the hi-end SAN)
versus a dual Opteron box with 12 * 500Gb SATA disk drives that gives you
5TB of storage for (in round numbers) a total of ~ $6k. And how much are
your ongoing monthlies on that hi-end SAN box? (Don't answer) So - aside
from the occasional use of the box for data migration, this ZFS "storage
box" has 1,001 other uses. Pick any two (uses), based on your knowledge
of big corporation thinking and its an easy sell to management.

Now your accounting folks are going to be asking you to justify the
purchase of that hi-end SAN box.... and why you're not using ZFS
everywhere. :)

Oh - and the accounting folks love it when you tell them there's no
ongoing cost of ownership - because Joe Screwdriver can swap out a failed
Seagate 500Gb SATA drive after he picks up a replacement from Frys on his
lunch break!
Post by Rainer Heilke
generated by the comment that, "not a big deal if the Financials and HR
databases are offline for three days while we do the migration." Good
Again - sounds like more "legacy" thinking. With multiple gigabit
ethernet connections you can move terrabytes of information in a hour,
instead of in 24-hours - using legacy tape systems etc. This can be
easily handled during scheduled downtime.
Post by Rainer Heilke
luck writing up a business case that justifies this sort of fiscal
generosity.
Sorry, this argument smacks a little too much of being out of touch with
the fiscal (and time) restrictions of working in a typical corporation,
as opposed to a well-funded research group.
I hope I'm not sounding rude, but those of us working in medium to large
corporations simply do not have the money for such luxuries. Period.
On the contrary - if you're not thinking ZFS, you're wasting a ton of IT
$s and hurting the competitiveness of your business.

Regards,

Al Hopper Logical Approach Inc, Plano, TX. ***@logical-approach.com
Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
OpenSolaris Governing Board (OGB) Member - Feb 2006
Toby Thain
2007-01-27 00:50:39 UTC
Permalink
Post by Al Hopper
Oh - and the accounting folks love it when you tell them there's no
ongoing cost of ownership - because Joe Screwdriver can swap out a failed
Seagate 500Gb SATA drive after he picks up a replacement from Frys on his
lunch break!
Why do people think this will work? I never could figure it out.

There's many a slip 'twixt cup and lip. You need the spare already
sitting there.

--T
Al Hopper
2007-01-27 01:24:57 UTC
Permalink
Post by Toby Thain
Post by Al Hopper
Oh - and the accounting folks love it when you tell them there's no
ongoing cost of ownership - because Joe Screwdriver can swap out a failed
Seagate 500Gb SATA drive after he picks up a replacement from Frys on his
lunch break!
Why do people think this will work? I never could figure it out.
There's many a slip 'twixt cup and lip. You need the spare already
sitting there.
Agreed. I remember years ago, when a Sun service tech came onsite at a
fortune 100 company I was working in at the time and we stopped him,
handed him a disk drive in an anti-static bag and said - "don't unpack
your tools - it was a bad disk, we replaced it from our spares, here's the
bad one - please replace it under the service agreement". He thought
about this for about 5 Seconds and said; "I wish all my customers were
like you guys". Then he was gone! :)

Regards,

Al Hopper Logical Approach Inc, Plano, TX. ***@logical-approach.com
Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
OpenSolaris Governing Board (OGB) Member - Feb 2006
Torrey McMahon
2007-01-27 00:57:15 UTC
Permalink
Post by Al Hopper
Now your accounting folks are going to be asking you to justify the
purchase of that hi-end SAN box.... and why you're not using ZFS
everywhere. :)
Oh - and the accounting folks love it when you tell them there's no
ongoing cost of ownership - because Joe Screwdriver can swap out a failed
Seagate 500Gb SATA drive after he picks up a replacement from Frys on his
lunch break!
Because ZFS doesn't run everywhere.
Because most low end JBODs are "low end" for a reason. They aren't as
reliable, have crappy monitoring, etc.

Fix those two things when you get a chance. ;)
Al Hopper
2007-01-27 01:26:33 UTC
Permalink
Post by Torrey McMahon
Post by Al Hopper
Now your accounting folks are going to be asking you to justify the
purchase of that hi-end SAN box.... and why you're not using ZFS
everywhere. :)
Oh - and the accounting folks love it when you tell them there's no
ongoing cost of ownership - because Joe Screwdriver can swap out a failed
Seagate 500Gb SATA drive after he picks up a replacement from Frys on his
lunch break!
Because ZFS doesn't run everywhere.
Because most low end JBODs are "low end" for a reason. They aren't as
reliable, have crappy monitoring, etc.
Agreed. There will never be one screwdriver that fits everything. I was
simply trying to re-inforce my point.
Post by Torrey McMahon
Fix those two things when you get a chance. ;)
Have a good weekend Torrey (and zfs-discuss).

Regards,

Al Hopper Logical Approach Inc, Plano, TX. ***@logical-approach.com
Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
OpenSolaris Governing Board (OGB) Member - Feb 2006
Torrey McMahon
2007-01-27 01:36:14 UTC
Permalink
Post by Al Hopper
Post by Torrey McMahon
Post by Al Hopper
Now your accounting folks are going to be asking you to justify the
purchase of that hi-end SAN box.... and why you're not using ZFS
everywhere. :)
Oh - and the accounting folks love it when you tell them there's no
ongoing cost of ownership - because Joe Screwdriver can swap out a failed
Seagate 500Gb SATA drive after he picks up a replacement from Frys on his
lunch break!
Because ZFS doesn't run everywhere.
Because most low end JBODs are "low end" for a reason. They aren't as
reliable, have crappy monitoring, etc.
Agreed. There will never be one screwdriver that fits everything. I was
simply trying to re-inforce my point.
It's a good point. We just need to make sure we don't forget that part.
People love to pull email threads out of context....or google for that
matter. ;)
Post by Al Hopper
Post by Torrey McMahon
Fix those two things when you get a chance. ;)
Have a good weekend Torrey (and zfs-discuss).
Same to you Al. (and zfs-discuss).
Rainer Heilke
2007-01-27 04:27:30 UTC
Permalink
Post by Al Hopper
Post by Rainer Heilke
Post by Constantin Gonzalez Schmitz
So, if I was an enterprise, I'd be willing to keep
enough empty LUNs
available to facilitate at least the migration of
one or more filesystems
if not complete pools.
.... reformatted ...
Post by Rainer Heilke
You might be, but don't be surprised when the Financials folks laugh you
out of their office. Large corporations do not make money by leaving
wads of cash lying around, and that's exactly what a few terabytes of
unused storage in a high-end SAN is. This is in addition to the laughter
But this is exactly where ZFS distrupts "Large corporations" thinking.
Yes and no. A corporation has a SAN for reasons that have been valid for years; you won't turn that ship around on a skating rink.
Post by Al Hopper
You're talking about (for example) 2 terabytes on a high-end SAN which
costs (what ?) per GB (including the capital cost of the hi-end SAN)
versus a dual Opteron box with 12 * 500Gb SATA disk drives that gives you
5TB of storage for (in round numbers) a total of ~ $6k. And how much are
your ongoing monthlies on that hi-end SAN box? (Don't answer) So - aside
from the occasional use of the box for data migration, this ZFS "storage
box" has 1,001 other uses. Pick any two (uses), based on your knowledge
of big corporation thinking and its an easy sell to management.
Now your accounting folks are going to be asking you to justify the
purchase of that hi-end SAN box.... and why you're not using ZFS
everywhere. :)
No, they're going to be asking me why I want to run a $400K server holding all of our inventory and financials data on a cheap piece of storage I picked up at Pa's Pizza Parlor and Computer Parts. There are values (real and imagined, perhaps) that a SAN offers. And, when the rest of the company is running on the SAN, why aren't you?

As a side-note, if your company has a mainframe (yes, they still exist!), when will ZFS run on it? We'll need the SAN for a while, yet.
Post by Al Hopper
Post by Rainer Heilke
generated by the comment that, "not a big deal if the Financials and HR
databases are offline for three days while we do the migration." Good
Again - sounds like more "legacy" thinking. With multiple gigabit
ethernet connections you can move terrabytes of information in a hour,
instead of in 24-hours - using legacy tape systems etc. This can be
easily handled during scheduled downtime.
If your company is graced with being able to cost-justify the rip-and-replace of the entire 100Mb network, more power to you. Someone has to pay for all of this, and good luck fobbing it all of on some client.
Post by Al Hopper
Post by Rainer Heilke
Sorry, this argument smacks a little too much of being out of touch with
the fiscal (and time) restrictions of working in a typical corporation,
as opposed to a well-funded research group.
I hope I'm not sounding rude, but those of us working in medium to large
corporations simply do not have the money for such luxuries. Period.
On the contrary - if you're not thinking ZFS, you're wasting a ton of IT
$s and hurting the competitiveness of your business.
But you can't write off the investment of the old gear in six months and move on. I wish life worked like that, but it doesn't. At least, not where I work. :-(
Post by Al Hopper
Regards,
Al Hopper
Rainer


This message posted from opensolaris.org
Constantin Gonzalez
2007-01-31 12:37:21 UTC
Permalink
Hi,

I need to be a little bit more precise in how I formulate comments:

1. Yes, zpool remove is a desirable feature, no doubt about that.

2. Most of the cases where customers ask for "zpool remove" can be solved
with zfs send/receive or with zpool replace. Think Pareto's 80-20 rule.

2a. The cost of doing 2., including extra scratch storage space or scheduling
related work into planned downtimes is smaller than the cost of not using
ZFS at all.

2b. Even in the remaining 20% of cases (figuratively speaking, YMMV) where
zpool remove would be the only solution, I feel that the cost of
sacrificing the extra storage space that would have become available
through "zpool remove" is smaller than the cost of the project not
benefitting from the rest of ZFS' features.

3. Bottom line: Everybody wants zpool remove as early as possible, but IMHO
this is not an objective barrier to entry for ZFS.

Note my use of the word "objective". I do feel that we have to implement
zpool remove for subjective reasons, but that is a non technical matter.

Is this an agreeable summary of the situation?

Best regards,
Constantin
--
Constantin Gonzalez Sun Microsystems GmbH, Germany
Platform Technology Group, Client Solutions http://www.sun.de/
Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/
Rainer Heilke
2007-01-31 21:34:07 UTC
Permalink
Hello.
Post by Constantin Gonzalez
2. Most of the cases where customers ask for "zpool
remove" can be solved
with zfs send/receive or with zpool replace. Think
Pareto's 80-20 rule.
This depends on where you define "most". In the cases I am looking at, I would have to disagree.
Post by Constantin Gonzalez
2a. The cost of doing 2., including extra scratch
storage space or scheduling
related work into planned downtimes is smaller
than the cost of not using
ZFS at all.
Not really. In a SAN environment, many of the ZFS features you list are either already in place, or irrelevant. As I commented to Al Hopper, if your company has a SAN, they're going to expect you to use it effectively. If your data is critical (and the last five years have seen a successful argument that SAN storage should be used for this data), then you aren't going to be able to convince management to use cheaper storage with ZFS any time soon. One other thing we've found is that "cheap storage", and by that I'm including an older SAN frame we have, doesn't have the cache and speed to keep up with the database usage on the ZPool. Performance sucks, and the developers and testers are getting uptight. Luckily our data shuffle should be done by Friday, and they'll all be back on the high-end SAN. So, combine these with the cost of high-end SAN storage, and you have a strong case for giv
ing back unused space (and largely negating your 2b argument).

Without the ability to give back one LUN of a 5-LUN zpool, ZFS buys us two things:
1) it's faster (that includes admin, as well as performance)
2) I can create a storage pool with 2 million+ files/inodes.

The other features, while I see them as vital to other people's environments, aren't that big to ours, largely due to our SAN. The export/import will be good during lifecycle upgrades, though (ok, make that three things ;-).

I hope that helps clarify my arguments, as your statements clarified yours.
Post by Constantin Gonzalez
Best regards,
Constantin
Cheers,
Rainer


This message posted from opensolaris.org
JS
2007-02-23 23:43:39 UTC
Permalink
Actually, I'm using ZFS in a SAN environment often importing LUNS to save management overhead and make snapshots easily available, among other things. I would love zfs remove because it allows me, in conjunction with containers, to build up a single managable pool for a number of local host systems while prototyping real storage requirements. I'd love the ability to migrate a test container to new hardware and replicate the data with zfs send/receive, then be able to resize my local pool. For production migration I already use 1 pool per container then export/import to pool into other systems on the SAN for redundancy/upgrades. Another thing that would be hot would be the ability to swap in and out a slower/differently sized set of drives into a pool for a faster set of drives, without incurring downtime. I ran into a situation where I wanted to migrate data out to more spindles on smaller drives, which I had available. Had remove been available I could have simply added the new smaller drive mirrors into the pool then removed the larger drives and freed up their use with no downtime. As your dataset gets larger in production, this becomes more of a desirable feature.


This message posted from opensolaris.org
Loading...