cannot replace X with Y: devices have different sector alignment

I think you can fool a recent Illumos kernel into thinking a 4k disk is 512
(incurring a performance hit for that disk, and therefore the vdev and
pool, but to save a raidz1, it might be worth it):

http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks , see
"Overriding the Physical Sector Size"

I don't know what you might have to do to coax it to do the replace with a
hot spare (zpool replace? export/import?). Perhaps there should be a
feature in ZFS that notifies when a pool is created or imported with a hot
spare that can't be automatically used in one or more vdevs? The whole
point of hot spares is to have them automatically swap in when you aren't
there to fiddle with things, which is a bad time to find out it won't work.

Tim

Post by LIC mesh
Well this is a new one....
Illumos/Openindiana let me add a device as a hot spare that evidently has
a different sector alignment than all of the other drives in the array.
So now I'm at the point that I /need/ a hot spare and it doesn't look like
I have it.
And, worse, the other spares I have are all the same model as said hot
spare.
Is there anything I can do with this or am I just going to be up the creek
when any one of the other drives in the raidz1 fails?
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

LIC mesh

2012-09-24 13:01:51 UTC

Yet another weird thing - prtvtoc shows both drives as having the same
sector size, etc:
***@nas:~# prtvtoc /dev/rdsk/c16t5000C5002AA08E4Dd0
* /dev/rdsk/c16t5000C5002AA08E4Dd0 partition map
*
* Dimensions:
* 512 bytes/sector
* 3907029168 sectors
* 3907029101 accessible sectors
*
* Flags:
* 1: unmountable
* 10: read-only
*
* Unallocated space:
* First Sector Last
* Sector Count Sector
* 34 222 255
*
* First Sector Last
* Partition Tag Flags Sector Count Sector Mount Directory
0 4 00 256 3907012495 3907012750
8 11 00 3907012751 16384 3907029134
***@nas:~# prtvtoc /dev/rdsk/c16t5000C5005295F727d0
* /dev/rdsk/c16t5000C5005295F727d0 partition map
*
* Dimensions:
* 512 bytes/sector
* 3907029168 sectors
* 3907029101 accessible sectors
*
* Flags:
* 1: unmountable
* 10: read-only
*
* Unallocated space:
* First Sector Last
* Sector Count Sector
* 34 222 255
*
* First Sector Last
* Partition Tag Flags Sector Count Sector Mount Directory
0 4 00 256 3907012495 3907012750
8 11 00 3907012751 16384 3907029134

Post by Timothy Coalson
I think you can fool a recent Illumos kernel into thinking a 4k disk is
512 (incurring a performance hit for that disk, and therefore the vdev and
http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks ,
see "Overriding the Physical Sector Size"
I don't know what you might have to do to coax it to do the replace with a
hot spare (zpool replace? export/import?). Perhaps there should be a
feature in ZFS that notifies when a pool is created or imported with a hot
spare that can't be automatically used in one or more vdevs? The whole
point of hot spares is to have them automatically swap in when you aren't
there to fiddle with things, which is a bad time to find out it won't work.
Tim

Post by LIC mesh
Well this is a new one....
Illumos/Openindiana let me add a device as a hot spare that evidently has
a different sector alignment than all of the other drives in the array.
So now I'm at the point that I /need/ a hot spare and it doesn't look
like I have it.
And, worse, the other spares I have are all the same model as said hot
spare.
Is there anything I can do with this or am I just going to be up the
creek when any one of the other drives in the raidz1 fails?
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

LIC mesh

2012-09-24 13:02:50 UTC

As does fdisk -G:
***@nas:~# fdisk -G /dev/rdsk/c16t5000C5002AA08E4Dd0
* Physical geometry for device /dev/rdsk/c16t5000C5002AA08E4Dd0
* PCYL NCYL ACYL BCYL NHEAD NSECT SECSIZ
60800 60800 0 0 255 252 512
You have new mail in /var/mail/root
***@nas:~# fdisk -G /dev/rdsk/c16t5000C5005295F727d0
* Physical geometry for device /dev/rdsk/c16t5000C5005295F727d0
* PCYL NCYL ACYL BCYL NHEAD NSECT SECSIZ
60800 60800 0 0 255 252 512

Post by LIC mesh
Yet another weird thing - prtvtoc shows both drives as having the same
* /dev/rdsk/c16t5000C5002AA08E4Dd0 partition map
*
* 512 bytes/sector
* 3907029168 sectors
* 3907029101 accessible sectors
*
* 1: unmountable
* 10: read-only
*
* First Sector Last
* Sector Count Sector
* 34 222 255
*
* First Sector Last
* Partition Tag Flags Sector Count Sector Mount Directory
0 4 00 256 3907012495 3907012750
8 11 00 3907012751 16384 3907029134
* /dev/rdsk/c16t5000C5005295F727d0 partition map
*
* 512 bytes/sector
* 3907029168 sectors
* 3907029101 accessible sectors
*
* 1: unmountable
* 10: read-only
*
* First Sector Last
* Sector Count Sector
* 34 222 255
*
* First Sector Last
* Partition Tag Flags Sector Count Sector Mount Directory
0 4 00 256 3907012495 3907012750
8 11 00 3907012751 16384 3907029134

Post by Timothy Coalson
I think you can fool a recent Illumos kernel into thinking a 4k disk is
512 (incurring a performance hit for that disk, and therefore the vdev and
http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks ,
see "Overriding the Physical Sector Size"
I don't know what you might have to do to coax it to do the replace with
a hot spare (zpool replace? export/import?). Perhaps there should be a
feature in ZFS that notifies when a pool is created or imported with a hot
spare that can't be automatically used in one or more vdevs? The whole
point of hot spares is to have them automatically swap in when you aren't
there to fiddle with things, which is a bad time to find out it won't work.
Tim

Post by LIC mesh
Well this is a new one....
Illumos/Openindiana let me add a device as a hot spare that evidently
has a different sector alignment than all of the other drives in the array.
So now I'm at the point that I /need/ a hot spare and it doesn't look
like I have it.
And, worse, the other spares I have are all the same model as said hot
spare.
Is there anything I can do with this or am I just going to be up the
creek when any one of the other drives in the raidz1 fails?
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Gregg Wonderly

2012-09-24 13:23:39 UTC

What is the error message you are seeing on the "replace"? This sounds like a slice size/placement problem, but clearly, prtvtoc seems to think that everything is the same. Are you certain that you did prtvtoc on the correct drive, and not one of the active disks by mistake?

Gregg Wonderly

Post by LIC mesh
* Physical geometry for device /dev/rdsk/c16t5000C5002AA08E4Dd0
* PCYL NCYL ACYL BCYL NHEAD NSECT SECSIZ
60800 60800 0 0 255 252 512
You have new mail in /var/mail/root
* Physical geometry for device /dev/rdsk/c16t5000C5005295F727d0
* PCYL NCYL ACYL BCYL NHEAD NSECT SECSIZ
60800 60800 0 0 255 252 512
* /dev/rdsk/c16t5000C5002AA08E4Dd0 partition map
*
* 512 bytes/sector
* 3907029168 sectors
* 3907029101 accessible sectors
*
* 1: unmountable
* 10: read-only
*
* First Sector Last
* Sector Count Sector
* 34 222 255
*
* First Sector Last
* Partition Tag Flags Sector Count Sector Mount Directory
0 4 00 256 3907012495 3907012750
8 11 00 3907012751 16384 3907029134
* /dev/rdsk/c16t5000C5005295F727d0 partition map
*
* 512 bytes/sector
* 3907029168 sectors
* 3907029101 accessible sectors
*
* 1: unmountable
* 10: read-only
*
* First Sector Last
* Sector Count Sector
* 34 222 255
*
* First Sector Last
* Partition Tag Flags Sector Count Sector Mount Directory
0 4 00 256 3907012495 3907012750
8 11 00 3907012751 16384 3907029134
http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks , see "Overriding the Physical Sector Size"
I don't know what you might have to do to coax it to do the replace with a hot spare (zpool replace? export/import?). Perhaps there should be a feature in ZFS that notifies when a pool is created or imported with a hot spare that can't be automatically used in one or more vdevs? The whole point of hot spares is to have them automatically swap in when you aren't there to fiddle with things, which is a bad time to find out it won't work.
Tim
Well this is a new one....
Illumos/Openindiana let me add a device as a hot spare that evidently has a different sector alignment than all of the other drives in the array.
So now I'm at the point that I /need/ a hot spare and it doesn't look like I have it.
And, worse, the other spares I have are all the same model as said hot spare.
Is there anything I can do with this or am I just going to be up the creek when any one of the other drives in the raidz1 fails?
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

LIC mesh

2012-09-24 14:46:36 UTC

That's what I thought also, but since both prtvtoc and fdisk -G see the two
disks as the same (and I have not overridden sector size), I am confused.
*
*
*iostat -xnE:*
c16t5000C5002AA08E4Dd0 Soft Errors: 0 Hard Errors: 323 Transport Errors:
489
Vendor: ATA Product: ST32000542AS Revision: CC34 Serial No:
%FAKESERIAL%
Size: 2000.40GB <2000398934016 bytes>
Media Error: 207 Device Not Ready: 0 No Device: 116 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c16t5000C5005295F727d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA Product: ST2000VX000-9YW1 Revision: CV13 Serial No:
%FAKESERIAL%
Size: 2000.40GB <2000398934016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

*zpool status:*
pool: rspool
state: ONLINE
scan: resilvered 719G in 65h28m with 0 errors on Fri Aug 24 04:21:44 2012
config:

NAME STATE READ WRITE CKSUM
rspool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
c16t5000C5002AA08E4Dd0 ONLINE 0 0 0
c16t5000C5002ABE78F5d0 ONLINE 0 0 0
c16t5000C5002AC49840d0 ONLINE 0 0 0
c16t50014EE057B72DD3d0 ONLINE 0 0 0
c16t50014EE057B69208d0 ONLINE 0 0 0
cache
c4t2d0 ONLINE 0 0 0
spares
c16t5000C5005295F727d0 AVAIL

errors: No known data errors

****@nas:~# zpool replace rspool c16t5000C5002AA08E4Dd0
c16t5000C5005295F727d0*
cannot replace c16t5000C5002AA08E4Dd0 with c16t5000C5005295F727d0: devices
have different sector alignment

Post by Gregg Wonderly
What is the error message you are seeing on the "replace"? This sounds
like a slice size/placement problem, but clearly, prtvtoc seems to think
that everything is the same. Are you certain that you did prtvtoc on the
correct drive, and not one of the active disks by mistake?
Gregg Wonderly
* Physical geometry for device /dev/rdsk/c16t5000C5002AA08E4Dd0
* PCYL NCYL ACYL BCYL NHEAD NSECT SECSIZ
60800 60800 0 0 255 252 512
You have new mail in /var/mail/root
* Physical geometry for device /dev/rdsk/c16t5000C5005295F727d0
* PCYL NCYL ACYL BCYL NHEAD NSECT SECSIZ
60800 60800 0 0 255 252 512

Post by LIC mesh
* /dev/rdsk/c16t5000C5002AA08E4Dd0 partition map
*
* 512 bytes/sector
* 3907029168 sectors
* 3907029101 accessible sectors
*
* 1: unmountable
* 10: read-only
*
* First Sector Last
* Sector Count Sector
* 34 222 255
*
* First Sector Last
* Partition Tag Flags Sector Count Sector Mount Directory
0 4 00 256 3907012495 3907012750
8 11 00 3907012751 16384 3907029134
* /dev/rdsk/c16t5000C5005295F727d0 partition map
*
* 512 bytes/sector
* 3907029168 sectors
* 3907029101 accessible sectors
*
* 1: unmountable
* 10: read-only
*
* First Sector Last
* Sector Count Sector
* 34 222 255
*
* First Sector Last
* Partition Tag Flags Sector Count Sector Mount Directory
0 4 00 256 3907012495 3907012750
8 11 00 3907012751 16384 3907029134

Post by Timothy Coalson
I think you can fool a recent Illumos kernel into thinking a 4k disk is
512 (incurring a performance hit for that disk, and therefore the vdev and
http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks ,
see "Overriding the Physical Sector Size"
I don't know what you might have to do to coax it to do the replace with
a hot spare (zpool replace? export/import?). Perhaps there should be a
feature in ZFS that notifies when a pool is created or imported with a hot
spare that can't be automatically used in one or more vdevs? The whole
point of hot spares is to have them automatically swap in when you aren't
there to fiddle with things, which is a bad time to find out it won't work.
Tim

Post by LIC mesh
Well this is a new one....
Illumos/Openindiana let me add a device as a hot spare that evidently
has a different sector alignment than all of the other drives in the array.
So now I'm at the point that I /need/ a hot spare and it doesn't look like I have it.
And, worse, the other spares I have are all the same model as said hot spare.
Is there anything I can do with this or am I just going to be up the
creek when any one of the other drives in the raidz1 fails?
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

LIC mesh

2012-09-24 19:37:25 UTC

Any ideas?

Post by LIC mesh
That's what I thought also, but since both prtvtoc and fdisk -G see the
two disks as the same (and I have not overridden sector size), I am
confused.
*
*
*iostat -xnE:*
489
%FAKESERIAL%
Size: 2000.40GB <2000398934016 bytes>
Media Error: 207 Device Not Ready: 0 No Device: 116 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c16t5000C5005295F727d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
%FAKESERIAL%
Size: 2000.40GB <2000398934016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
*zpool status:*
pool: rspool
state: ONLINE
scan: resilvered 719G in 65h28m with 0 errors on Fri Aug 24 04:21:44 2012
NAME STATE READ WRITE CKSUM
rspool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
c16t5000C5002AA08E4Dd0 ONLINE 0 0 0
c16t5000C5002ABE78F5d0 ONLINE 0 0 0
c16t5000C5002AC49840d0 ONLINE 0 0 0
c16t50014EE057B72DD3d0 ONLINE 0 0 0
c16t50014EE057B69208d0 ONLINE 0 0 0
cache
c4t2d0 ONLINE 0 0 0
spares
c16t5000C5005295F727d0 AVAIL
errors: No known data errors
c16t5000C5005295F727d0*
cannot replace c16t5000C5002AA08E4Dd0 with c16t5000C5005295F727d0: devices
have different sector alignment

Post by Timothy Coalson
I think you can fool a recent Illumos kernel into thinking a 4k disk is
512 (incurring a performance hit for that disk, and therefore the vdev and
http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks ,
see "Overriding the Physical Sector Size"
I don't know what you might have to do to coax it to do the replace
with a hot spare (zpool replace? export/import?). Perhaps there should be
a feature in ZFS that notifies when a pool is created or imported with a
hot spare that can't be automatically used in one or more vdevs? The whole
point of hot spares is to have them automatically swap in when you aren't
there to fiddle with things, which is a bad time to find out it won't work.
Tim

Post by LIC mesh
Well this is a new one....
Illumos/Openindiana let me add a device as a hot spare that evidently
has a different sector alignment than all of the other drives in the array.
So now I'm at the point that I /need/ a hot spare and it doesn't look
like I have it.
And, worse, the other spares I have are all the same model as said hot spare.
Is there anything I can do with this or am I just going to be up the
creek when any one of the other drives in the raidz1 fails?
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Timothy Coalson

2012-09-24 20:32:44 UTC

I'm not sure how to definitively check physical sector size on
solaris/illumos, but on linux, hdparm -I (capital i) or smartctl -i will do
it. OpenIndiana's smartctl doesn't output this information yet (and its
smartctl doesn't work on SATA disks unless attached via a SAS chip). The
issue is complicated by having both a logical and a physical sector size,
and as far as I am aware, on current disks, logical is always 512, which
may be what is being reported in what you ran. Some quick googling
suggests that previously, it was not possible to use an existing utility to
report the physical sector size on solaris, because someone wrote their own:

http://solaris.kuehnke.de/archives/18-Checking-physical-sector-size-of-disks-on-Solaris.html

So, if you want to make sure of the physical sector size, you could give
that program a whirl (it compiled fine for me on oi_151a6, and runs, but it
is not easy for me to attach a 4k sector disk to one of my OI machines, so
I haven't confirmed its correctness), or temporarily transplant the spare
in question to a linux machine (or live system) and use hdparm -I.

Tim

Post by LIC mesh
Any ideas?

Post by Timothy Coalson
I think you can fool a recent Illumos kernel into thinking a 4k disk
is 512 (incurring a performance hit for that disk, and therefore the vdev
http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks ,
see "Overriding the Physical Sector Size"
I don't know what you might have to do to coax it to do the replace
with a hot spare (zpool replace? export/import?). Perhaps there should be
a feature in ZFS that notifies when a pool is created or imported with a
hot spare that can't be automatically used in one or more vdevs? The whole
point of hot spares is to have them automatically swap in when you aren't
there to fiddle with things, which is a bad time to find out it won't work.
Tim

Post by LIC mesh
Well this is a new one....
Illumos/Openindiana let me add a device as a hot spare that evidently
has a different sector alignment than all of the other drives in the array.
So now I'm at the point that I /need/ a hot spare and it doesn't look
like I have it.
And, worse, the other spares I have are all the same model as said hot spare.
Is there anything I can do with this or am I just going to be up the
creek when any one of the other drives in the raidz1 fails?
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

LIC mesh

2012-09-25 16:42:27 UTC

Thank you for the link!

Turns out that, even though I bought the WD20EARS and ST32000542AS
expecting a 4096 physical blocksize, they report 512.

The new drive I bought correctly identifies as 4096 byte blocksize!

So...OI doesn't like it merging with the existing pool.

Note: ST2000VX000-9YW1 reports physical blocksize of 4096B. The other
drives that actually have 4096B blocks report 512B physical blocks. This
is misleading, but they do it anyway.

Post by Timothy Coalson
I'm not sure how to definitively check physical sector size on
solaris/illumos, but on linux, hdparm -I (capital i) or smartctl -i will do
it. OpenIndiana's smartctl doesn't output this information yet (and its
smartctl doesn't work on SATA disks unless attached via a SAS chip). The
issue is complicated by having both a logical and a physical sector size,
and as far as I am aware, on current disks, logical is always 512, which
may be what is being reported in what you ran. Some quick googling
suggests that previously, it was not possible to use an existing utility to
http://solaris.kuehnke.de/archives/18-Checking-physical-sector-size-of-disks-on-Solaris.html
So, if you want to make sure of the physical sector size, you could give
that program a whirl (it compiled fine for me on oi_151a6, and runs, but it
is not easy for me to attach a 4k sector disk to one of my OI machines, so
I haven't confirmed its correctness), or temporarily transplant the spare
in question to a linux machine (or live system) and use hdparm -I.
Tim

Post by LIC mesh
Any ideas?

Post by LIC mesh
Well this is a new one....
Illumos/Openindiana let me add a device as a hot spare that
evidently has a different sector alignment than all of the other drives in
the array.
So now I'm at the point that I /need/ a hot spare and it doesn't
look like I have it.
And, worse, the other spares I have are all the same model as said hot spare.
Is there anything I can do with this or am I just going to be up the
creek when any one of the other drives in the raidz1 fails?
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Trond Michelsen

2012-11-10 15:14:30 UTC

Post by LIC mesh
The new drive I bought correctly identifies as 4096 byte blocksize!
So...OI doesn't like it merging with the existing pool.

So... Any solution to this yet?

I've got a 42 drive zpool (21 mirror vdevs) with 12 2TB drives that
has 512byte blocksize. The remaining drives are 3TB with 4k blocksize,
and the pool uses ashift=12. Recently this happened to one of the 2TB
drives:

mirror-13 DEGRADED 0 0 0
c4t5000C5002AA2F8D6d0 UNAVAIL 0 0 0 cannot open
c4t5000C5002AB4FF17d0 ONLINE 0 0 0

and even though it came back after a reboot, I'd like to swap it for a
new drive. Obviously, all new drives have 4k blocksize, so I decided
to replace both drives in the vdev with 3TB drives. The new drives are
Seagate ST3000DM001-1CH1, and there are already 12 of these in the
pool.

# iostat -En
...
c4t5000C5004DE1EFF2d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA Product: ST3000DM001-1CH1 Revision: CC43 Serial No: Z1F0TKXV
Size: 3000.59GB <3000592982016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

c4t5000C5004DE863F2d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA Product: ST3000DM001-1CH1 Revision: CC43 Serial No: Z1F0VHTG
Size: 3000.59GB <3000592982016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

c4t5000C5004DD3F76Bd0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA Product: ST3000DM001-1CH1 Revision: CC43 Serial No: Z1F0T1QX
Size: 3000.59GB <3000592982016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

When I try to replace the old drive, I get this error:

# zpool replace tank c4t5000C5002AA2F8D6d0 c4t5000C5004DE863F2d0
cannot replace c4t5000C5002AA2F8D6d0 with c4t5000C5004DE863F2d0:
devices have different sector alignment

How can I replace the drive without migrating all the data to a
different pool? It is possible, I hope?

--
Trond Michelsen

Jan Owoc

2012-11-10 15:48:33 UTC

Post by Trond Michelsen
# zpool replace tank c4t5000C5002AA2F8D6d0 c4t5000C5004DE863F2d0
devices have different sector alignment
How can I replace the drive without migrating all the data to a
different pool? It is possible, I hope?

I had the same problem. I tried copying the partition layout and some
other stuff but without success. I ended up having to recreate the
pool and now have a non-mirrored root fs.

If anyone has figured out how to mirror drives after getting the
message about sector alignment, please let the list know :-).

Jan

Jan Owoc

2012-11-10 15:59:00 UTC

I had the same problem. I tried copying the partition layout and some
other stuff but without success. I ended up having to recreate the
pool and now have a non-mirrored root fs.
If anyone has figured out how to mirror drives after getting the
message about sector alignment, please let the list know :-).

Sorry... my question was partly answered by Jim Klimov on this list:
http://openindiana.org/pipermail/openindiana-discuss/2012-June/008546.html

Apparently the currently-suggested way (at least in OpenIndiana) is to:
1) create a zpool on the 4k-native drive
2) zfs send | zfs receive the data
3) mirror back onto the non-4k drive

I can't test it at the moment on my setup - has anyone tested this to work?

Jan

Tim Cook

2012-11-10 16:04:09 UTC

I had the same problem. I tried copying the partition layout and some
other stuff but without success. I ended up having to recreate the
pool and now have a non-mirrored root fs.
If anyone has figured out how to mirror drives after getting the
message about sector alignment, please let the list know :-).

http://openindiana.org/pipermail/openindiana-discuss/2012-June/008546.html
1) create a zpool on the 4k-native drive
2) zfs send | zfs receive the data
3) mirror back onto the non-4k drive
I can't test it at the moment on my setup - has anyone tested this to work?
Jan

That would absolutely work, but it's not really a fix for this situation.
For OP to do this he'd need 42 new drives (or at least enough drives to
provide the same capacity as what he's using) to mirror to and then mirror
back. The only way this is happening for most people is if they only have
a very small pool, and have the ability to add an equal amount of storage
to dump to. Probably not a big deal if you've only got a handful of
drives, or if the drives you have are small and you can take downtime.
Likely impossible for OP with 42 large drives.

--Tim

Jan Owoc

2012-11-10 16:16:48 UTC

Post by Jan Owoc
http://openindiana.org/pipermail/openindiana-discuss/2012-June/008546.html
1) create a zpool on the 4k-native drive
2) zfs send | zfs receive the data
3) mirror back onto the non-4k drive

That would absolutely work, but it's not really a fix for this situation.
For OP to do this he'd need 42 new drives (or at least enough drives to
provide the same capacity as what he's using) to mirror to and then mirror
back. The only way this is happening for most people is if they only have a
very small pool, and have the ability to add an equal amount of storage to
dump to. Probably not a big deal if you've only got a handful of drives, or
if the drives you have are small and you can take downtime. Likely
impossible for OP with 42 large drives.

Yes, you are right. I missed the fact that this mirror is part of a
very large pool, so zfs send | zfs receive isn't exactly an option.

Any other ideas short of block pointer rewrite?

Jan

Jim Klimov

2012-11-10 17:19:11 UTC

Post by Jan Owoc
Any other ideas short of block pointer rewrite?

A few... one is an idea of what could be the cause: AFAIK the
ashift value is not so much per-pool as per-toplevel-vdev.
If the pool started as a set of the 512b drives and was then
expanded to include sets of 4K drives, this mixed ashift could
happen...

It might be possible to override the ashift value with sd.conf
and fool the OS into using 512b sectors over a 4KB native disk
(this is mostly used the other way around, though - to enforce
4KB sectors on 4KB native drives that emulate 512b sectors).
This might work, and earlier posters on the list saw no evidence
to say that 512b emulation is inherently evil and unreliable
(modulo firmware/hardware errors that can be anywhere anyway),
but this would likely make the disk slower on random writes.

Also, I am not sure how the 4KB-native HDD would process partial
overwrites of a 4KB sector with 512b pieces of data - would other
bytes remain intact or not?..

Before trying to fool a production system this way, if at all,
I believe some stress-tests with small blocks are due on some
other system.

My 2c,
//Jim Klimov

Trond Michelsen

2012-11-12 23:21:43 UTC

Post by Jim Klimov

Post by Jan Owoc
Any other ideas short of block pointer rewrite?

Now I'm really confused. Turns out, my system is the opposite:

# zdb -C tank | grep ashift
ashift: 12
ashift: 12
ashift: 12
ashift: 12
ashift: 12
ashift: 12
ashift: 9
ashift: 9
ashift: 9
ashift: 9
ashift: 9
ashift: 9
ashift: 9
ashift: 9
ashift: 9
ashift: 12
ashift: 12
ashift: 12
ashift: 12
ashift: 12
ashift: 12

I had an old pool with ashift=9, and when I tried to add new disks,
zpool wouldn't let me add the new drives. So, I ended up creating a
new pool with ashift=12, and after migrating, destroyed the old pool,
and added the drives to the new. I was told at the time that as long
as the pool is created with ashift=12, new vdevs would have ashift=12
as well. Obviously, that's not the case. I did verify that ashift was
12 after creating the pool, but I apparently did not check after
adding the old drives, because this is the first time I've noticed
that there's any ashift=9 in the pool.

--
Trond Michelsen

Trond Michelsen

2012-11-12 17:45:40 UTC

Post by Jan Owoc
1) create a zpool on the 4k-native drive
2) zfs send | zfs receive the data
3) mirror back onto the non-4k drive
I can't test it at the moment on my setup - has anyone tested this to work?

That would absolutely work, but it's not really a fix for this situation.
For OP to do this he'd need 42 new drives (or at least enough drives to
provide the same capacity as what he's using) to mirror to and then mirror
back. The only way this is happening for most people is if they only have a
very small pool, and have the ability to add an equal amount of storage to
dump to. Probably not a big deal if you've only got a handful of drives, or
if the drives you have are small and you can take downtime. Likely
impossible for OP with 42 large drives.

Well, if I have to migrate, I basically have three alternatives:

1. safe way:
a) buy 24 4TB drives,
b) migrate everything

2. scary way
a) buy 6 4TB drives,
b) migrate about 12TB data to new pool
c) split all mirror vdevs on old pool, add 4k discs to new pool
d) migrate remaining data to new pool while holding my breath
e) destroy old pool and reattach discs to vdevs in new pool

3. slightly less scary way
a) buy 23 3TB drives
b) set up new pool with 4x mirrored vdevs and 15x non-redundant vdevs
c) migrate everything from old pool
d) detatch 3TB discs from mirrors in old pool and attach to vdevs in new pool

I've got room for the first method, but it'll be prohibitively
expensive, even if I sell the old drives. Until 4TB drives drop below
$100 this won't be a realistic option. I don't think I've got the
nerves to do it the scary way :) The third option is a lot cheaper
than the first, but it'll still be a solid chunk of money, so I'll
probably have to think about that for a bit.

That said, I've already migrated far too many times already. I really,
really don't want to migrate the pool again, if it can be avoided.
I've already migrated from raidz1 to raidz2 and then from raidz2 to
mirror vdevs. Then, even though I already had a mix of 512b and 4k
discs in the pool, when I bought new 3TB discs, I couldn't add them to
the pool, and I had to set up a new pool with ashift=12. In
retrospect, I should have built the new pool without the 2TB drives,
and had I known what I do now, I would definately have done that.

--
Trond Michelsen

Tim Cook

2012-11-10 16:00:55 UTC

I had the same problem. I tried copying the partition layout and some
other stuff but without success. I ended up having to recreate the
pool and now have a non-mirrored root fs.
If anyone has figured out how to mirror drives after getting the
message about sector alignment, please let the list know :-).
Jan

Not happening with anything that exists today. The only way this would be
possible is with bp_rewrite which would allow you to evacuate a vdev
(whether it be for a situation like this, or just to shrink a pool). What
you're trying to do is write a block for block copy to a disk that's made
up of a different block structure. Not happening.

*insert everyone saying they want bp_rewrite and the guys who have the
skills to do so saying their enterprise customers have other needs*

--Tim

Trond Michelsen

2012-11-12 16:39:07 UTC

Post by Trond Michelsen
How can I replace the drive without migrating all the data to a
different pool? It is possible, I hope?

I had the same problem. I tried copying the partition layout and some
other stuff but without success. I ended up having to recreate the
pool and now have a non-mirrored root fs.
If anyone has figured out how to mirror drives after getting the
message about sector alignment, please let the list know :-).

That is disappointing. I'll probably manage to find a used 2TB drive
with 512b blocksize, so I'm sure I'll be able to keep the pool alive,
but I had planned to swap all 2TB drives for 4TB drives within a year
or so. This is apparently not an option anymore. I'm also a bit
annoyed, because I cannot remember seeing any warnings (other than
performance wise) about mixing 512b and 4kB blocksize discs in a pool,
or any warnings that you'll be severely restricted if you use 512b
blocksize discs at all.

Post by Tim Cook
*insert everyone saying they want bp_rewrite and the guys who have the
skills to do so saying their enterprise customers have other needs*

bp_rewrite is what's needed to remove vdevs, right? If so, yes, being
able to remove (or replace) a vdev, would've solved my problem.
However, I don't see how this could not be desirable for enterprise
customers. 512b blocksize discs are rapidly disappearing from the
market. Enterprise discs fail ocasionally too, and if 512b blocksize
discs can't be replaced by 4kB blocksize discs, then that effectively
means that you can't replace failed drives on ZFS. I would think that
this is a desirable feature of an enterprise storage solution.

--
Trond Michelsen

Tim Cook

2012-11-13 01:28:45 UTC

Post by Trond Michelsen
How can I replace the drive without migrating all the data to a
different pool? It is possible, I hope?

I had the same problem. I tried copying the partition layout and some
other stuff but without success. I ended up having to recreate the
pool and now have a non-mirrored root fs.
If anyone has figured out how to mirror drives after getting the
message about sector alignment, please let the list know :-).

Not happening with anything that exists today. The only way this would

Post by Tim Cook
possible is with bp_rewrite which would allow you to evacuate a vdev
(whether it be for a situation like this, or just to shrink a pool).

What

Post by Tim Cook
you're trying to do is write a block for block copy to a disk that's

made up

Post by Tim Cook
of a different block structure. Not happening.

Post by Tim Cook
*insert everyone saying they want bp_rewrite and the guys who have the
skills to do so saying their enterprise customers have other needs*

Enterprise customers are guaranteed equivalent replacement drives for the
life of the system. Generally 3-5 years. At the end of that cycle, they
buy all new hardware and simply migrate the data. It's generally a
non-issue due to the way gear is written off.

--TIm

Marion Hakanson

2012-11-12 20:54:23 UTC