ZFS send | verify

If there were a “zfs send” datastream saved someplace, is there a way to
verify the integrity of that datastream without doing a “zfs receive” and
occupying all that disk space?

Depending of your version of OS, I think the following post from Richard Elling
will be of great interest to you:
- http://richardelling.blogspot.com/2009/10/check-integrity-of-zfs-send-streams.html

--
julien.
http://blog.thilelli.net/

Edward Ned Harvey

2009-12-05 00:11:50 UTC

Post by Julien Gabel
Depending of your version of OS, I think the following post from Richard Elling
-
http://richardelling.blogspot.com/2009/10/check-integrity-of-zfs-send-streams.
html

Thanks! :-)
No, wait! ....

According to that page, if you "zfs receive -n" then you should get a 0 exit
status for success, and 1 for error.

Unfortunately, I've been sitting here and testing just now ... I created a
"zfs send" datastream, then I made a copy of it and toggled a bit in the
middle to make it corrupt ...

I found that the "zfs receive -n" always returns 0 exit status, even if the
data stream is corrupt. In order to get the "1" exit status, you have to
get rid of the "-n" which unfortunately means writing the completely
restored filesystem to disk.

I've sent a message to Richard to notify him of the error on his page. But
it would seem, the zstreamdump must be the only way to verify the integrity
of a stored data stream. I haven't tried it yet, and I'm out of time for
today...

Sriram Narayanan

2009-12-05 04:23:00 UTC

If feasible, you may want to generate MD5 sums on the streamed output
and then use these for verification.

-- Sriram

Post by Julien Gabel
Depending of your version of OS, I think the following post from Richard Elling
-
http://richardelling.blogspot.com/2009/10/check-integrity-of-zfs-send-streams.
html

Thanks! :-)
No, wait! ....
According to that page, if you "zfs receive -n" then you should get a 0 exit
status for success, and 1 for error.
Unfortunately, I've been sitting here and testing just now ... I created a
"zfs send" datastream, then I made a copy of it and toggled a bit in the
middle to make it corrupt ...
I found that the "zfs receive -n" always returns 0 exit status, even if the
data stream is corrupt. In order to get the "1" exit status, you have to
get rid of the "-n" which unfortunately means writing the completely
restored filesystem to disk.
I've sent a message to Richard to notify him of the error on his page. But
it would seem, the zstreamdump must be the only way to verify the integrity
of a stored data stream. I haven't tried it yet, and I'm out of time for
today...
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
Sent from my mobile device

Seth Heeren

2009-12-05 10:33:20 UTC

Well what does _that_ verify?

It will verify that no bits provably broke during transport.

It will still leave the chance of getting an incompatible stream, an
incomplete stream (kill the dump), or plain corrupted data. Of course,
the chance of the latter should be extremely small in server-grade hardware.

$0.02

Post by Sriram Narayanan
If feasible, you may want to generate MD5 sums on the streamed output
and then use these for verification.
-- Sriram

Post by Julien Gabel
Depending of your version of OS, I think the following post from Richard Elling
-
http://richardelling.blogspot.com/2009/10/check-integrity-of-zfs-send-streams.
html

Thanks! :-)
No, wait! ....
According to that page, if you "zfs receive -n" then you should get a 0 exit
status for success, and 1 for error.
Unfortunately, I've been sitting here and testing just now ... I created a
"zfs send" datastream, then I made a copy of it and toggled a bit in the
middle to make it corrupt ...
I found that the "zfs receive -n" always returns 0 exit status, even if the
data stream is corrupt. In order to get the "1" exit status, you have to
get rid of the "-n" which unfortunately means writing the completely
restored filesystem to disk.
I've sent a message to Richard to notify him of the error on his page. But
it would seem, the zstreamdump must be the only way to verify the integrity
of a stored data stream. I haven't tried it yet, and I'm out of time for
today...
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Bob Friesenhahn

2009-12-05 15:22:12 UTC

Post by Sriram Narayanan
If feasible, you may want to generate MD5 sums on the streamed output
and then use these for verification.

You can also stream into a gzip or lzop wrapper in order to obtain the
benefit of incremental CRCs and some compression as well. As long as
the wrapper is generated on the sending side (and not subject to
problems like truncation) it should be quite useful for verifying that
the stream has not been corrupted.

Bob
--
Bob Friesenhahn
***@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/

dick hoogendijk

2009-12-05 16:42:06 UTC

Post by Bob Friesenhahn
You can also stream into a gzip or lzop wrapper in order to obtain the
benefit of incremental CRCs and some compression as well.

Can you give an example command line for this option please?

Bob Friesenhahn

2009-12-05 17:32:32 UTC

Post by dick hoogendijk

Post by Bob Friesenhahn
You can also stream into a gzip or lzop wrapper in order to obtain the
benefit of incremental CRCs and some compression as well.

Can you give an example command line for this option please?

Something like

zfs send mysnapshot | gzip -c -3 > /somestorage/mysnap.gz

should work nicely. Zfs send sends to its standard output so it is
just a matter of adding another filter program on its output. This
could be streamed over ssh or some other streaming network transfer
protocol.

Later, you can do 'gzip -t mysnap.gz' on the machine where the
snapshot file is stored to verify that it has not been corrupted in
storage or transfer.

lzop (not part of Solaris) is much faster than gzip but can be used in
a similar way since it is patterned after gzip.

Bob
--
Bob Friesenhahn
***@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/

Mike Gerdts

2009-12-05 19:03:39 UTC

On Sat, Dec 5, 2009 at 11:32 AM, Bob Friesenhahn

Post by dick hoogendijk

Post by Bob Friesenhahn
You can also stream into a gzip or lzop wrapper in order to obtain the
benefit of incremental CRCs and some compression as well.

Can you give an example command line for this option please?

Something like
zfs send mysnapshot | gzip -c -3 > /somestorage/mysnap.gz
should work nicely. Zfs send sends to its standard output so it is just a
matter of adding another filter program on its output. This could be
streamed over ssh or some other streaming network transfer protocol.
Later, you can do 'gzip -t mysnap.gz' on the machine where the snapshot file
is stored to verify that it has not been corrupted in storage or transfer.
lzop (not part of Solaris) is much faster than gzip but can be used in a
similar way since it is patterned after gzip.

It seems as though a similar filter could be created to create and
inject an error correcting code into the stream. That is:

zfs send $snap | ecc -i > /somestorage/mysnap.ecc
ecc -o < /somestorage/mysnap | zfs receive ...

I'm not aware of an existing ecc program, but I can't imagine it
would be hard to create one. There seems to already be an
implementation of Reed-Solomon encoding in ON that could likely be
used as a starting point.

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/vdev_raidz.c

--
Mike Gerdts
http://mgerdts.blogspot.com/

Richard Elling

2009-12-07 01:23:41 UTC

Post by Mike Gerdts
On Sat, Dec 5, 2009 at 11:32 AM, Bob Friesenhahn

Post by dick hoogendijk

Post by Bob Friesenhahn
You can also stream into a gzip or lzop wrapper in order to
obtain the
benefit of incremental CRCs and some compression as well.

Can you give an example command line for this option please?

Something like
zfs send mysnapshot | gzip -c -3 > /somestorage/mysnap.gz
should work nicely. Zfs send sends to its standard output so it is
just a
matter of adding another filter program on its output. This could be
streamed over ssh or some other streaming network transfer protocol.
Later, you can do 'gzip -t mysnap.gz' on the machine where the snapshot file
is stored to verify that it has not been corrupted in storage or transfer.
lzop (not part of Solaris) is much faster than gzip but can be used in a
similar way since it is patterned after gzip.

It seems as though a similar filter could be created to create and
zfs send $snap | ecc -i > /somestorage/mysnap.ecc
ecc -o < /somestorage/mysnap | zfs receive ...
I'm not aware of an existing ecc program, but I can't imagine it
would be hard to create one. There seems to already be an
implementation of Reed-Solomon encoding in ON that could likely be
used as a starting point.
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/vdev_raidz.c

It all depends on the failure you want to protect against. If you
don't know the failure mode, you won't be very effective. For
example, to protect against a unrecoverable read on a single
disk sector, you need an ECC that can recover 512 bytes. It is
this thought process that led to the original RAID work (and is
one reason why nobody does RAID-2). By contrast, if you are
a working at the media level, then it is not uncommon to have
errors that affect a few contiguous bytes, and an ECC code can
be effective (AIUI, 40% of the bits on a modern HDD are not data).
-- richard

Seth Heeren

2009-12-05 16:51:22 UTC

Post by Sriram Narayanan
If feasible, you may want to generate MD5 sums on the streamed output
and then use these for verification.

Same deal as with MD5 sums. It doesn't guarantee that the stream is
'receivable' on the receiver.
Now, unless your wrapper is able to retransmit on CRC error, a MD5 would
be vastly superior due to qualilty of error detection.
Both techniques would be optimal (although I'd suspect the compression
doesn't help. I should think the send/recv streams will be compressed as
it is).

Post by Bob Friesenhahn
Bob
--
Bob Friesenhahn
http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Edward Ned Harvey

2009-12-06 15:28:06 UTC

Post by Sriram Narayanan
If feasible, you may want to generate MD5 sums on the streamed output
and then use these for verification.

That's actually not a bad idea. It should be kinda obvious, but I hadn't
thought of it because it's sort-of duplicating existing functionality.

I do have a "multipipe" script that behaves similar to "tee" but "tee" can
only output to stdout and a file. "multipipe" launches any number of
processes, and pipes stdin to all of the child processes. I normally use
this when creating a large datastream ... I generate the datastream, and I
want to md5 the uncompressed datastream, and I also want to gzip the
uncompressed datastream. I don't want to generate the filestream twice.
Then I will gunzip | md5 to check the sum.

I also have a "threadzip" script, because gzip is invariably the bottleneck
in the data stream. Utilize those extra cores!!! ;-)

I plan to release these things open source soon, so if anyone has interest,
please let me know.

Bob Friesenhahn

2009-12-06 15:54:11 UTC

Post by Edward Ned Harvey
I also have a "threadzip" script, because gzip is invariably the bottleneck
in the data stream. Utilize those extra cores!!! ;-)

Seth Heeren

2009-12-06 16:01:58 UTC

Post by Edward Ned Harvey
I also have a "threadzip" script, because gzip is invariably the bottleneck
in the data stream. Utilize those extra cores!!! ;-)

I use the excellent pbzip2

zfs send ... | tee >(md5sum) | pbzip2 | ssh remote ...

Utilizes those 8 cores quite well :)

Edward Ned Harvey

2009-12-06 17:15:15 UTC

Post by Seth Heeren
I use the excellent pbzip2
zfs send ... | tee >(md5sum) | pbzip2 | ssh remote ...
Utilizes those 8 cores quite well :)

This (pbzip2) sounds promising, and it must be better than what I wrote.
;-) But I don't understand the syntax you've got above, using tee,
redirecting to something in parens. I haven't been able to do this yet on
my own system. Can you please give me an example to simultaneously generate
md5sum and gzip?

This is how I currently do it:
cat somefile | multipipe "md5sum > somefile.md5sum" "gzip > somefile.gz"
End result is:
somefile
somefile.md5sum
somefile.gz

Seth Heeren

2009-12-06 18:12:18 UTC

Post by Seth Heeren
I use the excellent pbzip2
zfs send ... | tee >(md5sum) | pbzip2 | ssh remote ...
Utilizes those 8 cores quite well :)

This (pbzip2) sounds promising, and it must be better than what I wrote.
;-) But I don't understand the syntax you've got above, using tee,
redirecting to something in parens. I haven't been able to do this yet on
my own system. Can you please give me an example to simultaneously generate
md5sum and gzip?
cat somefile | multipipe "md5sum > somefile.md5sum" "gzip > somefile.gz"
somefile
somefile.md5sum
somefile.gz

Well the theory is simple. "tee" is quite sufficient, because it will
not just operate on files. It will operate on _file descriptors_ big
difference. A file descriptor can point to a whole slew of things, among
which are files and pipes, socket files, fifo's or whatever the heck
your brand of UNIX wants to call those.

Now, the shell usually gives you a lot of usual syntax for that

ls > /dev/stderr
is usually a synonym for
ls > /proc/self/fd/2

On to the topic of pipes...

You could make the 'anonymous' filedescriptors that your shell opens up
internally to link the pipe processes together, explicit like so:

mkfifo /tmp/myzippipe
mkfifo /tmp/myhashpipe
(zfs send ... | tee /tmp/myzippipe /tmp/myhashpipe)&
(cat /tmp/myzippipe | gzip > zipped_stream)&
(cat /tmp/myhashpipe | md5sum > MD5SUMs)&
wait
unlink /tmp/my*pipe

All that is painfully verbose, leaves dangling fifo's on errors, has
security issues (fifo's on /tmp?) and looks like a clutch. It appears
that a number of shells (i think i remember using this on bash, sh, ksh)
support the nifty and obvious shorthand

cat >(subshell command line)

which will be replaced (like in command line, environment, glob and
other expansion) by the proper filedescriptor like

cat /dev/fd/23

Of course the actual number would be 'random'; depending on shell,
processes running etc.

This makes your needed multi-tee a snap:

cat my_log_file | tee >(gzip > my_log_file.gz) >(wc -l) >(md5sum) |
sort | uniq -c

This will do all your hearts desires at once :) Note how the >(subshell)
notation allows you to do most anything your shell supports, including
using aliases, functions, redirection exactly like you would in
$(subshell) [1].

Well I'll stop here, because I'm sure 'man $0' in your favourite shell
will tell you more info more pertinent without requiring quite so many
keystrokes on my part

Cheers,
Seth

[1] Beware that it _is_ a subshell, so you cannot update shell
variables, certain things will not be inherited from the parent shell
(especially in security restricted environments)

Edward Ned Harvey

2009-12-07 02:17:25 UTC

Post by Seth Heeren
cat my_log_file | tee >(gzip > my_log_file.gz) >(wc -l) >(md5sum) |
sort | uniq -c

That is great. ;-) Thank you very much.

sgheeren

2009-12-06 18:14:16 UTC

Post by Seth Heeren
I use the excellent pbzip2
zfs send ... | tee >(md5sum) | pbzip2 | ssh remote ...
Utilizes those 8 cores quite well :)

This (pbzip2) sounds promising, and it must be better than what I wrote.
;-) But I don't understand the syntax you've got above, using tee,
redirecting to something in parens. I haven't been able to do this yet on
my own system. Can you please give me an example to simultaneously generate
md5sum and gzip?
cat somefile | multipipe "md5sum > somefile.md5sum" "gzip > somefile.gz"
somefile
somefile.md5sum
somefile.gz

So that would be

cat somefile | tee >(md5sum > somefile.md5sum) | gzip > somefile.gz

Edward Ned Harvey

2009-12-06 16:58:28 UTC

Post by Bob Friesenhahn
Gzip can be a bit slow. Luckily there is 'lzop' which is quite a lot
more CPU efficient on i386 and AMD64, and even on SPARC. If the
compressor is able to keep up with the network and disk, then it is
fast enough. See "http://www.lzop.org/".

In my development/testing this week, I did "time zfs send | gzip --fast >
somefile.gz" and also "time zfs send | threadzip --threads=8 > somefile.tz"
...

Threadzip performed 10x faster (hardly a performance I expect from lzop) and
compressed about 2-3% smaller than gzip. Also hardly a performance I could
expect from lzop.

The key is multiple cores. I'm on an 8-core xeon.

As for "fast enough," the metric I'm using is: Can the compressor keep up
with IO? I do this: "time zfs send > /dev/null" and "time zfs send |
[compressor] > /dev/null" to see if the compressor has an impact on
performance.

I'm only at rev 1.0 of threadzip, and it is *far* from optimized. But it's
still an order of magnitude better than the alternatives. So it'll only get
better from here.

Bob Friesenhahn

2009-12-06 17:47:59 UTC

Post by Edward Ned Harvey
Threadzip performed 10x faster (hardly a performance I expect from lzop) and
compressed about 2-3% smaller than gzip. Also hardly a performance I could
expect from lzop.
The key is multiple cores. I'm on an 8-core xeon.

I am glad to see that you found a use for all those cores.

As a simple test here, on AMD64 and Solaris 10 I see 3.6X less CPU
consumption from 'lzop -3' than from 'gzip -3'. With lots of
background activity (zfs scrub of the pool), this increases to a 4X
advantage.

Bob
--
Bob Friesenhahn
***@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/

Edward Ned Harvey

2009-12-07 00:02:13 UTC

Post by Bob Friesenhahn
I see 3.6X less CPU
consumption from 'lzop -3' than from 'gzip -3'.

Where do you get lzop from? I don't see any binaries on their site, nor
blastwave, nor opencsw. And I am having difficulty building it from source.

Bob Friesenhahn

2009-12-07 00:59:52 UTC

Post by Bob Friesenhahn
I see 3.6X less CPU
consumption from 'lzop -3' than from 'gzip -3'.

Where do you get lzop from? I don't see any binaries on their site, nor
blastwave, nor opencsw. And I am having difficulty building it from source.

I just built it from source. :-)

First one has to build and install the lzo 2.03 library (from
http://www.oberhumer.com/opensource/lzo/) and then build lzop.

I used GCC, but not the archaic version that Sun provides with Solaris
10.

Bob
--
Bob Friesenhahn
***@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/

Edward Ned Harvey

2009-12-07 02:09:13 UTC

Oh well. I built LZO, and can't seem to link it in the lzop build, despite
correctly setting the FLAGS variables they say in the INSTALL file. I'd
love to provide an lzop comparison, but can't get it. I give up ... Also,
can't build python-lzo. Also would be sweet, but hey.

For whoever cares, here is the comparison that I do have:

I'm doing a "zfs send" of my rpool, piping through the named compressor, and
dump to /dev/null. rpool is on a 2-disk mirror, SATA 7200
2 sockets 4 core Xeons (total 8 cores, capable of 16 threads)
System idle in all respects, except this activity.

Threadzip is using zlib (similar or same as gzip) breaking stream into 5M
chunks and parallel threading the compression of those chunks.

------------------------------------------- pass1
9.52GB 2m14.578s-------no compression
5.69GB 2m15.963s-------threadzip 32 threads --fast
5.69GB 2m13.609s-------threadzip 16 threads --fast
5.69GB 2m21.968s-------threadzip 8 threads --fast
(Above, "zfs send" is the bottleneck. Don't know if compressor can go
faster.)
(Below, the compressor is the bottleneck.)
5.69GB 3m17.789s-------threadzip 4 threads --fast
5.56GB 3m29.619s-------threadzip 16 threads --best
5.56GB 4m24.761s-------threadzip 8 threads --best
5.44GB 5m13.139s-------pbzip2 auto
5.44GB 5m21.030s-------pbzip2 16 processes
5.44GB 6m4.915s--------pbzip2 8 processes
5.70GB 7m41.209s-------gzip --fast
------------------------------------------- pass2
9.52GB 2m17.858s-------no compression
5.69GB 2m13.446s-------threadzip 32 threads --fast
5.69GB 2m9.842s--------threadzip 16 threads --fast
5.69GB 2m22.388s-------threadzip 8 threads --fast
(Above, "zfs send" is the bottleneck. Don't know if compressor can go
faster.)
(Below, the compressor is the bottleneck.)
5.69GB 3m10.701s-------threadzip 4 threads --fast
5.56GB 3m27.772s-------threadzip 16 threads --best
5.56GB 4m22.409s-------threadzip 8 threads --best
5.44GB 5m15.247s-------pbzip2 auto
5.44GB 5m21.089s-------pbzip2 16 processes
5.44GB 6m5.412s--------pbzip2 8 processes
5.70GB 7m22.505s-------gzip --fast

Seth Heeren

2009-12-06 16:01:39 UTC

Post by Sriram Narayanan
If feasible, you may want to generate MD5 sums on the streamed output
and then use these for verification.

In my POSIX universe I can just do

zfs send ... | pv | tee >(md5sum) >(sha256sum) | gzip | tee >(md5sum

Post by Edward Ned Harvey
.md5.zipped) | ssh remote etc. etc.
"multipipe" launches any number of
processes, and pipes stdin to all of the child processes. I normally use
this when creating a large datastream ... I generate the datastream, and I
want to md5 the uncompressed datastream, and I also want to gzip the
uncompressed datastream. I don't want to generate the filestream twice.
Then I will gunzip | md5 to check the sum.
I also have a "threadzip" script, because gzip is invariably the bottleneck
in the data stream. Utilize those extra cores!!! ;-)
I plan to release these things open source soon, so if anyone has interest,
please let me know.
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Richard Elling

2009-12-05 16:17:48 UTC

Post by Julien Gabel
Depending of your version of OS, I think the following post from
Richard
Elling
-
http://richardelling.blogspot.com/2009/10/check-integrity-of-zfs-send-streams
.
html

I believe it will depend on the nature of the corruption. Regardless,
the answer is to use zstreamdump.
-- richard

Colin Raven

2009-12-06 15:46:31 UTC

Post by Julien Gabel
Depending of your version of OS, I think the following post from Richard

Post by Julien Gabel
Elling
-
http://richardelling.blogspot.com/2009/10/check-integrity-of-zfs-send-streams
.
html

I believe it will depend on the nature of the corruption. Regardless,
the answer is to use zstreamdump.

Richard, do you know of any usage examples of zfstreamdump? I've been
searching for examples since you posted this, and don't see anything that
shows how to use it in practice. argh.
-C

Edward Ned Harvey

2009-12-06 15:43:56 UTC

Where exactly do you get zstreamdump?
I found a link to zstreamdump.c ... but is that it? Shouldn't it be part of
a source tarball or something?

Does it matter what OS? Every reference I see for zstreamdump is about
opensolaris. But I'm running solaris.

Julien Gabel

2009-12-06 20:53:51 UTC

Post by Julien Gabel
Depending of your version of OS, I think the following post from Richard

Where exactly do you get zstreamdump?
I found a link to zstreamdump.c ... but is that it? Shouldn't it be part of
a source tarball or something?
Does it matter what OS? Every reference I see for zstreamdump is about
opensolaris. But I'm running solaris.

OS means Operating System, or OpenSolaris. This is in the second
meaning I wrote OS in my answer. It was not obvious you were using
Solaris 10 though. Sorry about that.

(FYI, zstreamdump seems to be an addition to build 125.)

--
julien.
http://blog.thilelli.net/

Edward Ned Harvey

2009-12-06 23:35:02 UTC

Post by Julien Gabel
OS means Operating System, or OpenSolaris. This is in the second
meaning I wrote OS in my answer. It was not obvious you were using
Solaris 10 though. Sorry about that.
(FYI, zstreamdump seems to be an addition to build 125.)

Oh - I never connected OS to OpenSolaris. ;-)

So I gather it's not a downloadable item. If zstreamdump is in your
operating system then great, and if not, it's not available until you
upgrade your operating system. Right?

Richard Elling

2009-12-07 01:33:56 UTC

Oh - I never connected OS to OpenSolaris. ;-)
So I gather it's not a downloadable item. If zstreamdump is in your
operating system then great, and if not, it's not available until you
upgrade your operating system. Right?

... or use a virtual machine.
-- richard

Amado Gramajo

2009-12-06 18:43:39 UTC