Introducing zilstat

Discussion:

Introducing zilstat

Richard Elling

2009-01-31 02:35:15 UTC

For those who didn't follow down the thread this afternoon,
I have posted a tool call zilstat which will help you to answer
the question of whether a separate log might help your
workload. Details start here:
http://richardelling.blogspot.com/2009/01/zilstat.html

Enjoy!
-- richard

Blake

2009-01-31 18:51:07 UTC

Permalink

I'm already using it. This could be really useful for my Windows
roaming-profile application of ZFS/NFS/SMB

On Fri, Jan 30, 2009 at 9:35 PM, Richard Elling

Post by Richard Elling
For those who didn't follow down the thread this afternoon,
I have posted a tool call zilstat which will help you to answer
the question of whether a separate log might help your
http://richardelling.blogspot.com/2009/01/zilstat.html
Enjoy!
-- richard
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Marion Hakanson

2009-02-05 02:24:44 UTC

Permalink

The zilstat tool is very helpful, thanks!

I tried it on an X4500 NFS server, while extracting a 14MB tar archive,
both via an NFS client, and locally on the X4500 itself. Over NFS,
said extract took ~2 minutes, and showed peaks of 4MB/sec buffer-bytes
going through the ZIL.

When run locally on the X4500, the extract took about 1 second, with
zilstat showing all zeroes. I wonder if this is a case where that
ZIL bypass kicks in for >32K writes, in the local tar extraction.
Does zilstat's underlying dtrace include these bypass-writes in the
totals it displays?

I think if it's possible to get stats on this bypassed data, I'd like
to see it as another column (or set of columns) in the zilstat output.

Regards,

Marion

Jorgen Lundman

2009-02-05 02:52:39 UTC

Permalink

Interesting, but what does it mean :)

The x4500 for mail (NFS vers=3 on ufs on zpool with quotas):

# ./zilstat.ksh
N-Bytes N-Bytes/s N-Max-Bytes/s B-Bytes B-Bytes/s B-Max-Bytes/s
376720 376720 376720 1286144 1286144 1286144
419608 419608 419608 1368064 1368064 1368064
555256 555256 555256 1732608 1732608 1732608
538808 538808 538808 1679360 1679360 1679360
626048 626048 626048 1773568 1773568 1773568
753824 753824 753824 2105344 2105344 2105344
652632 652632 652632 1716224 1716224 1716224

Fairly constantly between 1-2MB/s. That doesn't sound too bad though.
It's only got 400 nfsd threads at the moment, but peaks at 1024.
Incidentally, what is the highest recommended nfsd_threads for a x4500
anyway?

Lund

Post by Marion Hakanson
The zilstat tool is very helpful, thanks!
I tried it on an X4500 NFS server, while extracting a 14MB tar archive,
both via an NFS client, and locally on the X4500 itself. Over NFS,
said extract took ~2 minutes, and showed peaks of 4MB/sec buffer-bytes
going through the ZIL.
When run locally on the X4500, the extract took about 1 second, with
zilstat showing all zeroes. I wonder if this is a case where that
ZIL bypass kicks in for >32K writes, in the local tar extraction.
Does zilstat's underlying dtrace include these bypass-writes in the
totals it displays?
I think if it's possible to get stats on this bypassed data, I'd like
to see it as another column (or set of columns) in the zilstat output.
Regards,
Marion
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
Jorgen Lundman | <***@lundman.net>
Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell)
Japan | +81 (0)3 -3375-1767 (home)

Richard Elling

2009-02-05 05:17:30 UTC

Permalink

Post by Jorgen Lundman
Interesting, but what does it mean :)
# ./zilstat.ksh
N-Bytes N-Bytes/s N-Max-Bytes/s B-Bytes B-Bytes/s B-Max-Bytes/s
376720 376720 376720 1286144 1286144 1286144
419608 419608 419608 1368064 1368064 1368064
555256 555256 555256 1732608 1732608 1732608
538808 538808 538808 1679360 1679360 1679360
626048 626048 626048 1773568 1773568 1773568
753824 753824 753824 2105344 2105344 2105344
652632 652632 652632 1716224 1716224 1716224
Fairly constantly between 1-2MB/s. That doesn't sound too bad though.

I think your workload would benefit from a fast, separate log device.

Post by Jorgen Lundman
It's only got 400 nfsd threads at the moment, but peaks at 1024.
Incidentally, what is the highest recommended nfsd_threads for a x4500
anyway?

Highest recommended is what you need to get the job done.
For the most part, the defaults work well. But you can experiment
with them and see if you can get better results.

I've got some ideas about how to implement some more features
for zilstat, but might not be able to get to it over the next few
days. So there still time to accept recommendations :-)
-- richard

Jorgen Lundman

2009-02-05 05:57:00 UTC

Permalink

Post by Richard Elling

Post by Jorgen Lundman
# ./zilstat.ksh
N-Bytes N-Bytes/s N-Max-Bytes/s B-Bytes B-Bytes/s B-Max-Bytes/s
376720 376720 376720 1286144 1286144 1286144
419608 419608 419608 1368064 1368064 1368064
555256 555256 555256 1732608 1732608 1732608
538808 538808 538808 1679360 1679360 1679360
626048 626048 626048 1773568 1773568 1773568
753824 753824 753824 2105344 2105344 2105344
652632 652632 652632 1716224 1716224 1716224
Fairly constantly between 1-2MB/s. That doesn't sound too bad though.

I think your workload would benefit from a fast, separate log device.

Interesting. Today is the first I've heard about it.. one of the x4500
is really really slow, something like 15 seconds to do an unlink. But I
assumed it was because the ufs inside zvol is _really_ bloated. Maybe we
need to experiment with it on the test x4500.

Post by Richard Elling
Highest recommended is what you need to get the job done.
For the most part, the defaults work well. But you can experiment
with them and see if you can get better results.

It came shipped with 16. And I'm sorry but 16 didn't cut it at all :) We
set it at 1024 as it was the highest number I found via Google.

Lund

--
Jorgen Lundman | <***@lundman.net>
Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell)
Japan | +81 (0)3 -3375-1767 (home)

Richard Elling

2009-02-05 05:20:13 UTC

Permalink

This is what I would expect. What you are seeing is the affect of the
NFS protocol and how the server commits data to disk on behalf of
the client -- by using sync writes.

Post by Marion Hakanson
I think if it's possible to get stats on this bypassed data, I'd like
to see it as another column (or set of columns) in the zilstat output.

Yes. I've got a few more columns in mind, too. Does anyone still use
a VT100? :-)
-- richard

Carsten Aulbert

2009-02-05 06:25:50 UTC

Permalink

Hi Richard,

Post by Richard Elling
Yes. I've got a few more columns in mind, too. Does anyone still use
a VT100? :-)

Only when using ILOM ;)

(anyone using 72 char/line MUA, sorry to them, the following lines are longer):

Thanks for the great tool, it showed something very interesting yesterday:

s06: TIME N-MBytes N-MBytes/s N-Max-Rate B-MBytes B-MBytes/s B-Max-Rate
s06: 2009 Feb 4 14:37:11 5 0 0 10 0 1
s06: 2009 Feb 4 14:37:26 6 0 1 12 0 1
s06: 2009 Feb 4 14:37:41 4 0 0 10 0 1
s06: 2009 Feb 4 14:37:56 5 0 1 11 0 1
s06: 2009 Feb 4 14:38:11 6 0 1 11 0 2
s06: 2009 Feb 4 14:38:26 7 0 1 13 0 2
s06: 2009 Feb 4 14:38:41 10 0 2 17 1 3
s06: 2009 Feb 4 14:38:56 4 0 0 9 0 1
s06: 2009 Feb 4 14:39:11 5 0 1 11 0 1
s06: 2009 Feb 4 14:39:26 7 0 0 13 0 1
s06: 2009 Feb 4 14:39:41 7 0 2 13 0 3
s06: 2009 Feb 4 14:39:56 6 0 1 11 0 2
s06: 2009 Feb 4 14:40:11 6 0 1 12 0 1
s06: 2009 Feb 4 14:40:26 6 0 0 13 0 1
s06: 2009 Feb 4 14:40:41 5 0 0 10 0 1
s06: 2009 Feb 4 14:40:56 6 0 1 12 0 1
s06: 2009 Feb 4 14:41:11 4 0 0 9 0 1
[..]
so far, the box was almost idle, a little bit later:
s06: 2009 Feb 4 14:53:41 2 0 0 5 0 0
s06: 2009 Feb 4 14:53:56 1 0 0 3 0 0
s06: 2009 Feb 4 14:54:11 1 0 0 4 0 0
s06: 2009 Feb 4 14:54:26 1 0 0 3 0 0
s06: 2009 Feb 4 14:54:41 2 0 0 5 0 0
s06: 2009 Feb 4 14:54:56 604 40 171 702 46 198
s06: 2009 Feb 4 14:55:11 816 54 130 939 62 154
s06: 2009 Feb 4 14:55:26 2 0 0 4 0 0
s06: 2009 Feb 4 14:55:41 2 0 0 4 0 0
s06: 2009 Feb 4 14:55:56 1 0 0 3 0 0
s06: 2009 Feb 4 14:56:11 3 0 0 6 0 1
s06: 2009 Feb 4 14:56:26 1 0 0 3 0 0
[...]
s06: 2009 Feb 4 16:13:11 1 0 0 3 0 0
s06: 2009 Feb 4 16:13:26 2 0 0 5 0 0
s06: 2009 Feb 4 16:13:41 389 25 97 477 31 119
s06: 2009 Feb 4 16:13:56 505 33 193 599 39 218
s06: 2009 Feb 4 16:14:11 2 0 0 4 0 0
s06: 2009 Feb 4 16:14:26 3 0 0 5 0 1
s06: 2009 Feb 4 16:14:41 1 0 0 3 0 0
s06: 2009 Feb 4 16:14:56 2 0 0 6 0 1
s06: 2009 Feb 4 16:15:11 4 0 2 10 0 4
s06: 2009 Feb 4 16:15:26 0 0 0 1 0 0
s06: 2009 Feb 4 16:15:41 128 8 94 168 11 123
s06: 2009 Feb 4 16:15:56 1081 72 212 1305 87 279
s06: 2009 Feb 4 16:16:11 262 17 99 317 21 122
s06: 2009 Feb 4 16:16:26 0 0 0 0 0 0

just showing a few bursts...

Given that this is the output of 'zilstat.ksh -M -t 15' I guess we should really look into
a fast device for it, right?

Do you have any hint, which numbers are reasonable on a X4500 and which are approaching
serious problems?

Cheers

Carsten

Marion Hakanson

2013-01-08 00:33:17 UTC

Permalink

Greetings,
We're trying out a new JBOD here. Multipath (mpxio) is not working, and we
could use some feedback and/or troubleshooting advice.
. . .
Sometimes the mpxio detection doesn't work properly. You can try to whitelist
them, https://www.illumos.org/issues/644

Thanks Richard, I was hoping I hadn't just made up my vague memory
of such functionality. We'll give it a try.

Regards,

Marion

Marion Hakanson

2013-03-16 07:50:26 UTC

Permalink

I get a little nervous at the thought of hooking all that up to a single
server, and am a little vague on how much RAM would be advisable, other than
"as much as will fit" (:-). Then again, I've been waiting for something

pNFS/NFSv4.1 to be usable for gluing together multiple NFS servers into a
single global namespace, without any sign of that happening anytime soon.

NFS v4 or DFS (or even clever sysadmin + automount) offers single namespace
without needing the complexity of NFSv4.1, lustre, glusterfs, etc.

Been using NFSv4 since it showed up in Solaris-10 FCS, and it is true
that I've been clever enough (without automount -- I like my computers
to be as deterministic as possible, thank you very much :-) for our
NFS clients to see a single directory-tree namespace which abstracts
away the actual server/location of a particular piece of data.

However, we find it starts getting hard to manage when a single project
(think "directory node") needs more space than their current NFS server
will hold. Or perhaps what you're getting at above is even more clever
than I have been to date, and is eluding me at the moment. I did see
someone mention "NFSv4 referrals" recently, maybe that would help.

Plus, believe it or not, some of our customers still insist on having the
server name in their path hierarchy for some reason, like /home/mynfs1/,
/home/mynfs2/, and so on. Perhaps I've just not been persuasive enough
yet (:-).

Don't forget about backups :-)

I was hoping I could get by with telling them to buy two of everything.

Thanks and regards,

Marion