Discussion:
ZFS recovery tools
Sigbjorn Lie
2010-06-02 06:20:10 UTC
Permalink
Hi,

I have just recovered from a ZFS crash. During the antagonizing time this took, I was surprised to
learn how undocumented the tools and options for ZFS recovery we're. I managed to recover thanks
to some great forum posts from Victor Latushkin, however without his posts I would still be crying
at night...

I think the worst example is the zdb man page, which all it does is to ask you to "contact a Sun
Engineer", as this command is for experts only. What the hell? I don't have a support contract for
my home machines... I don't feel like this is the right way to go for an open source project...

A penny for anyone elses thoughts or facts about why it's like this...:)



regards,
Sigbjorn Lie

's/windows/unix/g'
- "Ubuntu" - an African word, meaning "Slackware is too hard for me"
David Magda
2010-06-02 11:40:24 UTC
Permalink
What the hell? I don't have a support contract for my home
machines... I don't feel like this is the right way to go for an
open source project...
Write a letter demanding a refund. Join the OpenSolaris Governing Board.

I'm not sure Oracle's focus is now toward the community, as it is
towards recouping the billions of dollars it paid to buy Sun. There
are many helpful folks on this list (both in and out of Oracle) who
will try to help you, but if you absolutely need support, you need to
pay for it.

And as with any open source software (even things like Samba,
OpenLDAP, etc.) there is no guarantee of support, just mailing lists
and volunteers who do for their own reasons. OpenSolaris without a
contract is the exact same way: there's no guarantees. If you want
support with your distribution, pay Red Hat or SuSE; if you want it
with OpenLDAP, pay someone like Symas; with OpenSolaris, pay Oracle.

Personally I've run into a lot of crappy documentation on Linux and
GNU, and had to turn to online search and trail and error. OpenSolaris
has comparatively much better documentation.
David Magda
2010-06-02 13:45:47 UTC
Permalink
Post by Sigbjorn Lie
I have just recovered from a ZFS crash. During the antagonizing time
this took, I was surprised to learn how undocumented the tools and
options for ZFS recovery we're. I managed to recover thanks to some great
forum posts from Victor Latushkin, however without his posts I would
still be crying at night...
For the archives, from a private exchange:

Zdb(1M) is complicated and in-flux, so asking on zfs-discuss or calling
Oracle isn't a very onerous request IMHO.
Post by Sigbjorn Lie
zpool import [-o mntopts] [ -o property=value] ... [-d dir | -c
cachefile] [-D] [-f] [-R root] [-F [-n]] pool | id [newpool]
[...]
Post by Sigbjorn Lie
-F
Recovery mode for a non-importable pool. Attempt to return
the pool to an importable state by discarding the last few
transactions. Not all damaged pools can be recovered by
using this option. If successful, the data from the
discarded transactions is irretrievably lost. This option
is ignored if the pool is importable or already imported.
http://docs.sun.com/app/docs/doc/819-2240/zpool-1m

This is available as of svn_128, and not in Solaris as of Update 8 (10/09):

http://bugs.opensolaris.org/view_bug.do?bug_id=6667683

This was part of PSARC 2009/479:

http://arc.opensolaris.org/caselog/PSARC/2009/479/
http://www.c0t0d0s0.org/archives/6067-PSARC-2009479-zpool-recovery-support.html
http://sparcv9.blogspot.com/2009/09/zpool-recovery-support-psarc2009479.html

Personally I'm waiting for Solaris 10u9 for a lot of these fixes and
updates [...].
Sigbjørn Lie
2010-06-04 12:58:21 UTC
Permalink
Post by David Magda
Post by Sigbjorn Lie
I have just recovered from a ZFS crash. During the antagonizing time
this took, I was surprised to learn how undocumented the tools and
options for ZFS recovery we're. I managed to recover thanks to some great
forum posts from Victor Latushkin, however without his posts I would
still be crying at night...
Zdb(1M) is complicated and in-flux, so asking on zfs-discuss or calling
Oracle isn't a very onerous request IMHO.
Post by Sigbjorn Lie
zpool import [-o mntopts] [ -o property=value] ... [-d dir | -c
cachefile] [-D] [-f] [-R root] [-F [-n]] pool | id [newpool]
[...]
Post by Sigbjorn Lie
-F
Recovery mode for a non-importable pool. Attempt to return
the pool to an importable state by discarding the last few
transactions. Not all damaged pools can be recovered by
using this option. If successful, the data from the
discarded transactions is irretrievably lost. This option
is ignored if the pool is importable or already imported.
http://docs.sun.com/app/docs/doc/819-2240/zpool-1m
http://bugs.opensolaris.org/view_bug.do?bug_id=6667683
http://arc.opensolaris.org/caselog/PSARC/2009/479/
http://www.c0t0d0s0.org/archives/6067-PSARC-2009479-zpool-recovery-support.html
http://sparcv9.blogspot.com/2009/09/zpool-recovery-support-psarc2009479.html
Personally I'm waiting for Solaris 10u9 for a lot of these fixes and
updates [...].
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Excellent! I wish I would have known about these features when I was attempting to recover my pool using 2009.06/snv111.

I still believe there are some document updates to be done. While I was attempting to recover my pool and googling for information I found none of these documents. What I did find was a lot of forum posts about people that did not manage to make a recover, and assumed their data was lost.

"ZFS Troubleshooting and Data Recovery" from the "Solaris ZFS Administration Guide" and the ZFS Troubleshooting Guide at SolarisInternals would greatly benefit from being updated with the information you provided. One of the reasons for this is that they appear at the top of googles rankings for "zfs recovery" as search topic. :)

Thank you for the links. :)
Miles Nordin
2010-06-04 18:18:13 UTC
Permalink
sl> Excellent! I wish I would have known about these features when
sl> I was attempting to recover my pool using 2009.06/snv111.

the OP tried the -F feature. It doesn't work after you've lost zpool.cache:

op> I was setting up a new systen (osol 2009.06 and updating to
op> the lastest version of osol/dev - snv_134 - with
op> deduplication) and then I tried to import my backup zpool, but
op> it does not work.

op> # zpool import -f tank1
op> cannot import 'tank1': one or more devices is currently unavailable
op> Destroy and re-create the pool from a backup source

op> Any other option (-F, -X, -V, -D) and any combination of them
op> doesn't helps too.

I have been in here repeatedly warning about this incompleteness of
the feature while fanbois keep saying ``we have slog recovery so don't
worry.''

R., please let us know if the 'zdb -e -bcsvL <zpool-name>' incantation
Sigbjorn suggested ends up working for you or not.
Victor Latushkin
2010-06-04 20:16:59 UTC
Permalink
Post by Miles Nordin
sl> Excellent! I wish I would have known about these features when
sl> I was attempting to recover my pool using 2009.06/snv111.
Starting from build 128 option -F is documented option for 'zpool import' and 'zpool clear' and it has nothing to do with zpool.cache. Old -F has been renamed to -V

In some cases it may be possible to extract configuration details from the in-pool copy of configuration by running

zdb -eC <poolname>

regards
victor
Post by Miles Nordin
op> I was setting up a new systen (osol 2009.06 and updating to
op> the lastest version of osol/dev - snv_134 - with
op> deduplication) and then I tried to import my backup zpool, but
op> it does not work.
op> # zpool import -f tank1
op> cannot import 'tank1': one or more devices is currently unavailable
op> Destroy and re-create the pool from a backup source
op> Any other option (-F, -X, -V, -D) and any combination of them
op> doesn't helps too.
I have been in here repeatedly warning about this incompleteness of
the feature while fanbois keep saying ``we have slog recovery so don't
worry.''
R., please let us know if the 'zdb -e -bcsvL <zpool-name>' incantation
Sigbjorn suggested ends up working for you or not.
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
R. Eulenberg
2010-06-12 14:22:57 UTC
Permalink
Post by Miles Nordin
Post by Miles Nordin
op> I was setting up a new systen (osol 2009.06
and updating to
Post by Miles Nordin
op> the lastest version of osol/dev - snv_134 -
with
Post by Miles Nordin
op> deduplication) and then I tried to import my
backup zpool, but
Post by Miles Nordin
op> it does not work.
op> # zpool import -f tank1
op> cannot import 'tank1': one or more devices
is currently unavailable
Post by Miles Nordin
op> Destroy and re-create the pool from a backup
source
Post by Miles Nordin
op> Any other option (-F, -X, -V, -D) and any
combination of them
Post by Miles Nordin
op> doesn't helps too.
R., please let us know if the 'zdb -e -bcsvL
<zpool-name>' incantation
Post by Miles Nordin
Sigbjorn suggested ends up working for you or not.
Hi,

it seems that you are talking about my case of trouble. I answered in this thread:

http://www.opensolaris.org/jive/thread.jspa?messageID=485342&tstart=0#485342

I hope of any ideas helping me.

Regards
Ron
--
This message posted from opensolaris.org
R. Eulenberg
2010-07-03 21:33:01 UTC
Permalink
Post by R. Eulenberg
Post by Miles Nordin
Post by Miles Nordin
op> I was setting up a new systen (osol 2009.06
and updating to
Post by Miles Nordin
op> the lastest version of osol/dev - snv_134 -
with
Post by Miles Nordin
op> deduplication) and then I tried to import my
backup zpool, but
Post by Miles Nordin
op> it does not work.
op> # zpool import -f tank1
op> cannot import 'tank1': one or more devices
is currently unavailable
Post by Miles Nordin
op> Destroy and re-create the pool from a backup
source
Post by Miles Nordin
op> Any other option (-F, -X, -V, -D) and any
combination of them
Post by Miles Nordin
op> doesn't helps too.
R., please let us know if the 'zdb -e -bcsvL
<zpool-name>' incantation
Post by Miles Nordin
Sigbjorn suggested ends up working for you or not.
Hi,
it seems that you are talking about my case of trouble.
http://www.opensolaris.org/jive/thread.jspa?messageID=485342&tstart=0#485342
I hope of any ideas helping me.
Regards
Ron
Hi,

several days ago I added this:

set /zfs/:zfs_recover=/1/
set aok=/1/

to the /etc/system file and ran:

zdb -e -bcsvL tank1

without an output and without a prompt (prozess hangs up)
and result of running:

zdb -eC tank1

was the same.
I hope you could help me because the problem isn't solved.

Regards
Ron
Victor Latushkin
2010-07-06 14:47:57 UTC
Permalink
Post by R. Eulenberg
Post by R. Eulenberg
Post by Miles Nordin
Post by Miles Nordin
op> I was setting up a new systen (osol 2009.06
and updating to
Post by Miles Nordin
op> the lastest version of osol/dev - snv_134 -
with
Post by Miles Nordin
op> deduplication) and then I tried to import my
backup zpool, but
Post by Miles Nordin
op> it does not work.
op> # zpool import -f tank1
op> cannot import 'tank1': one or more devices
is currently unavailable
Post by Miles Nordin
op> Destroy and re-create the pool from a backup
source
Post by Miles Nordin
op> Any other option (-F, -X, -V, -D) and any
combination of them
Post by Miles Nordin
op> doesn't helps too.
R., please let us know if the 'zdb -e -bcsvL
<zpool-name>' incantation
Post by Miles Nordin
Sigbjorn suggested ends up working for you or not.
Hi,
it seems that you are talking about my case of trouble.
http://www.opensolaris.org/jive/thread.jspa?messageID=485342&tstart=0#485342
I hope of any ideas helping me.
Regards
Ron
Hi,
set /zfs/:zfs_recover=/1/
set aok=/1/
zdb -e -bcsvL tank1
settings in the /etc/systm file do not affect zdb in any way. In recent builds (135+) there's an option -A for zdb that allows to simulate setting one or both of these parameters, but anyway it is not useful in your case.
Post by R. Eulenberg
without an output and without a prompt (prozess hangs up)
How did you arrive at this conclusion? Did you check "pstack `pgrep zdb`/1" output a few times?
Post by R. Eulenberg
zdb -eC tank1
was the same.
I hope you could help me because the problem isn't solved.
Regards
Ron
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
R. Eulenberg
2010-07-08 07:15:39 UTC
Permalink
Hi,

today I was running

zdb -e -bcsvL tank1

and

zdb -eC tank1

again and it don't comes back a reply or prompt from the system. Than I was open a new console and run

pstack 'pgrep zdb'/1

and system answers:

pstack: cannot examine pgrep zdb/1: no such process or core file

What's that? Why I don't get back a prompt and an answer of sending the 1st two commands?

Regards
ron
--
This message posted from opensolaris.org
Victor Latushkin
2010-07-08 16:27:26 UTC
Permalink
Post by R. Eulenberg
pstack 'pgrep zdb'/1
pstack: cannot examine pgrep zdb/1: no such process or core file
use ` instead of ' in the above command.
R. Eulenberg
2010-07-09 10:47:22 UTC
Permalink
So, just I took the right command. For me the output is very cryptic and I cannot get any information helping me. I uploaded the output to a filehoster. http://ifile.it/vzwn50s/Output.txt
I hope you can tell me what it means.

Regards
ron
--
This message posted from opensolaris.org
Loading...