Jamie Krier
2012-12-12 18:21:55 UTC
I've hit this bug on four of my Solaris 11 servers. Looking for anyone else
who has seen it, as well as comments/speculation on cause.
This bug is pretty bad. If you are lucky you can import the pool read-only
and migrate it elsewhere.
I've also tried setting zfs:zfs_recover=1,aok=1 with varying results.
http://docs.oracle.com/cd/E26502_01/html/E28978/gmkgj.html#scrolltoc
Hardware platform:
Supermicro X8DAH
144GB ram
Supermicro sas2 jbods
LSI 9200-8e controllers (Phase 13 fw)
Zuesram log
ZuesIops sas l2arc
Seagate ST33000650SS sas drives
All four servers are running the same hardware, so at first I suspected a
problem there. I opened a ticket with Oracle which ended with this email:
---------------------------------------------------------------------------------------------------------------------------------
We strongly expect that this is a software issue because this problem does
not happen
on Solaris 10. On Solaris 11, it happens with both the SPARC and the X64
versions of
Solaris.
We have quite a few customer who have seen this issue and we are in the
process of
working on a fix. Because we do not know the source of the problem yet, I
cannot speculate
on the time to fix. This particular portion of Solaris 11 (the virtual
memory sub-system) is quite
different than in Solaris 10. We re-wrote the memory management in order
to get ready for
systems with much more memory than Solaris 10 was designed to handle.
Because this is the memory management system, there is not expected to be
any
work-around.
Depending on your company's requirements, one possibility is to use Solaris
10 until this
issue is resolved.
I apologize for any inconvenience that this bug may cause. We are working
on it as a Sev 1 Priority1 in sustaining engineering.
---------------------------------------------------------------------------------------------------------------------------------
I am thinking about switching to an Illumos distro, but wondering if this
problem may be present there as well.
Thanks
- Jamie
who has seen it, as well as comments/speculation on cause.
This bug is pretty bad. If you are lucky you can import the pool read-only
and migrate it elsewhere.
I've also tried setting zfs:zfs_recover=1,aok=1 with varying results.
http://docs.oracle.com/cd/E26502_01/html/E28978/gmkgj.html#scrolltoc
Hardware platform:
Supermicro X8DAH
144GB ram
Supermicro sas2 jbods
LSI 9200-8e controllers (Phase 13 fw)
Zuesram log
ZuesIops sas l2arc
Seagate ST33000650SS sas drives
All four servers are running the same hardware, so at first I suspected a
problem there. I opened a ticket with Oracle which ended with this email:
---------------------------------------------------------------------------------------------------------------------------------
We strongly expect that this is a software issue because this problem does
not happen
on Solaris 10. On Solaris 11, it happens with both the SPARC and the X64
versions of
Solaris.
We have quite a few customer who have seen this issue and we are in the
process of
working on a fix. Because we do not know the source of the problem yet, I
cannot speculate
on the time to fix. This particular portion of Solaris 11 (the virtual
memory sub-system) is quite
different than in Solaris 10. We re-wrote the memory management in order
to get ready for
systems with much more memory than Solaris 10 was designed to handle.
Because this is the memory management system, there is not expected to be
any
work-around.
Depending on your company's requirements, one possibility is to use Solaris
10 until this
issue is resolved.
I apologize for any inconvenience that this bug may cause. We are working
on it as a Sev 1 Priority1 in sustaining engineering.
---------------------------------------------------------------------------------------------------------------------------------
I am thinking about switching to an Illumos distro, but wondering if this
problem may be present there as well.
Thanks
- Jamie