Discussion:
Dirves going offline in Zpool
(too old to reply)
Ram Chander
2013-03-23 07:59:36 UTC
Permalink
Raw Message
Hi,

I have Dell md1200 connected to two heads ( Dell R710 ). The heads have
Perc H800 card and drives are configured in Raid0 ( Virtual Disk) in the
RAID controller.

One of the drives had crashed and is replaced by a spare. Resilvering was
triggered but fails to complete due to drives going offline. I have to
reboot the head ( R710) and drives comes online. This happened repeatedly
when resilver was 4% done, and again was rebooted , again hung at 27%
done, etc.

The issues happens with both Solaris11.1/ Omnios.
Its a 100Tb pool with 69Tb used. I have critical data and cant afford loss
of data.
Can I recover the data anyway ( atleast partially ) ?

I had verified there is no hardware issue with H800 and also upgraded the
firmware for H800. The issue happens with both the heads.

Current OS: Solaris 11.1

Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /***@0
,0/pci8086,***@7/pci1028,***@0/***@12,0 (sd26):
Mar 22 21:47:55 solaris Command failed to complete...Device is gone
Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /***@0
,0/pci8086,***@7/pci1028,***@0/***@c,0 (sd20):
Mar 22 21:47:55 solaris Command failed to complete...Device is gone
Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /***@0
,0/pci8086,***@7/pci1028,***@0/***@18,0 (sd32):
Mar 22 21:47:55 solaris Command failed to complete...Device is gone
Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /***@0
,0/pci8086,***@7/pci1028,***@0/***@1c,0 (sd36):
Mar 22 21:47:55 solaris Command failed to complete...Device is gone
Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /***@0
,0/pci8086,***@7/pci1028,***@0/***@1b,0 (sd35):
Mar 22 21:47:55 solaris Command failed to complete...Device is gone
Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /***@0
,0/pci8086,***@7/pci1028,***@0/***@1e,0 (sd38):
Mar 22 21:47:55 solaris Command failed to complete...Device is gone
Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /***@0
,0/pci8086,***@7/pci1028,***@0/***@19,0 (sd33):
Mar 22 21:47:55 solaris Command failed to complete...Device is gone
Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /***@0
,0/pci8086,***@7/pci1028,***@0/***@1d,0 (sd37):
Mar 22 21:47:55 solaris Command failed to complete...Device is gone
Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /***@0
,0/pci8086,***@7/pci1028,***@0/***@27,0 (sd47):
Mar 22 21:47:55 solaris Command failed to complete...Device is gone
Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /***@0
,0/pci8086,***@7/pci1028,***@0/***@26,0 (sd46):
Mar 22 21:47:55 solaris Command failed to complete...Device is gone

# zpool status -v

pool: test
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Wed Mar 20 19:13:40 2013
27.4T scanned out of 69.6T at 183M/s, 67h11m to go
2.43T resilvered, 39.32% done
config:

NAME STATE READ WRITE CKSUM
test DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
c8t0d0 ONLINE 0 0 0
c8t1d0 DEGRADED 0 0 0
c8t2d0 DEGRADED 0 0 0
c8t3d0 ONLINE 0 0 0
spare-4 DEGRADED 0 0 0
12459181442598970150 UNAVAIL 0 0 0
c8t45d0 DEGRADED 0 0 0
(resilvering)
raidz1-1 ONLINE 0 0 0
c8t5d0 ONLINE 0 0 0
c8t6d0 ONLINE 0 0 0
c8t7d0 ONLINE 0 0 0
c8t8d0 ONLINE 0 0 0
c8t9d0 ONLINE 0 0 0
raidz1-3 DEGRADED 0 0 0
c8t12d0 ONLINE 0 0 0
c8t13d0 ONLINE 0 0 0
c8t14d0 ONLINE 0 0 0
c8t15d0 DEGRADED 0 0 0
c8t16d0 ONLINE 0 0 0
c8t17d0 ONLINE 0 0 0
c8t18d0 ONLINE 0 0 0
c8t19d0 ONLINE 0 0 0
c8t20d0 DEGRADED 0 0 0
c8t21d0 DEGRADED 0 0 0
spare-10 DEGRADED 0 0 0
c8t22d0 DEGRADED 0 0 0
c8t47d0 DEGRADED 0 0 0
(resilvering)
c8t23d0 ONLINE 0 0 0
raidz1-4 DEGRADED 0 0 0
c8t24d0 DEGRADED 0 0 0
c8t25d0 ONLINE 0 0 0
c8t26d0 ONLINE 0 0 0
c8t27d0 ONLINE 0 0 0
c8t28d0 ONLINE 0 0 0
c8t29d0 DEGRADED 0 0 0
c8t30d0 ONLINE 0 0 0
raidz1-5 DEGRADED 0 0 0
spare-0 DEGRADED 0 0 5
c8t31d0 DEGRADED 0 0 0
c8t46d0 DEGRADED 0 0 0
(resilvering)
c8t32d0 ONLINE 0 0 0
c8t33d0 ONLINE 0 0 0
c8t34d0 ONLINE 0 0 0
c8t35d0 DEGRADED 0 0 0
c8t36d0 DEGRADED 0 0 0
c8t37d0 ONLINE 0 0 0
raidz1-6 DEGRADED 0 0 0
c8t38d0 DEGRADED 0 0 0
c8t39d0 ONLINE 0 0 0
c8t40d0 DEGRADED 0 0 0
c8t41d0 DEGRADED 0 0 0
c8t42d0 ONLINE 0 0 0
c8t43d0 ONLINE 0 0 0
c8t44d0 ONLINE 0 0 0
spares
c8t45d0 INUSE
c8t46d0 INUSE
c8t47d0 INUSE

device details:

c8t1d0 DEGRADED scrub/resilver needed
status: ZFS detected errors on this device.
The device is missing some data that is recoverable.

c8t2d0 DEGRADED scrub/resilver needed
status: ZFS detected errors on this device.
The device is missing some data that is recoverable.

12459181442598970150 UNAVAIL was /dev/dsk/c2t4d0s0
status: ZFS detected errors on this device.
The device was missing.

c8t45d0 DEGRADED scrub/resilver needed
status: ZFS detected errors on this device.
The device is missing some data that is recoverable.

c8t15d0 DEGRADED scrub/resilver needed
status: ZFS detected errors on this device.
The device is missing some data that is recoverable.

c8t20d0 DEGRADED scrub/resilver needed
status: ZFS detected errors on this device.
The device is missing some data that is recoverable.

c8t21d0 DEGRADED scrub/resilver needed
status: ZFS detected errors on this device.
The device is missing some data that is recoverable.

c8t22d0 DEGRADED scrub/resilver needed
status: ZFS detected errors on this device.
The device is missing some data that is recoverable.

The device is missing some data that is recoverable.

Regrads,
Ram

Loading...