linux-kernel - Re: 2850 drive bays not hot-swap?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Date:	Fri, 25 Jan 2008 12:37:05 -0500
From:	"Brett Dikeman" <brett.dikeman@...il.com>
To:	linux-poweredge@...ts.us.dell.com, linux-kernel@...r.kernel.org
Subject: Re: 2850 drive bays not hot-swap?

On Jan 24, 2008 3:35 PM, William Warren
<hescominsoon@...anuelcomputerconsulting.com> wrote:

> http://lists.us.dell.com/pipermail/linux-poweredge/2003-July/008898.html

An update: I don't know whether it is a kernel problem or a hardware
problem, but the instructions  pertaining to removing and adding SCSI
devices live above doesn't work on a PE2850 (which uses "LSI Logic /
Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 08)",
aka Dell's PERC controller 4e/Di, driven by the mpt* drivers.)

  It *appears* to work, but the bus seems to grind to a near halt.
This may be an inaccurate conclusion, but I'm leaning toward kernel
issues, since iowait didn't skyrocket.

ID's 2-5 on a U320 bus off a PERC controller are in a RAID10 array.
IDs 0&1 are in a separate mirror.

1)echoed "scsi remove-single-device 0 0 5 0" > /proc/scsi/scsi

2)Pull the drive and issued mdadm /dev/md0 --remove detached

4)Re-inserted the same drive and issued echo "scsi add-single-device 0
0 5 0" > /proc/scsi/scsi
(and the kernel gave it /dev/sdg instead of /dev/sdf, as a side-note.)

5)issued mdadm /dev/md0 --re-add /dev/sdg1

No messages indicating anything afoul.

I then fired up watch -n 5 /proc/mdstat and noticed that the rebuild
rate was a fraction of what it should have been; hovering at a very
constant number- I believe 2339KB/sec.  It should be 74,000KB/sec
(drops to as low as 45,000KB/sec at the tail end of the drives.)  Load
average jumped to 2+, yet nothing showed up in top as taking any CPU
time, even with a 10 second average.  No iowait indicated.  I
increased sync_speed_min to 20000 with no change.

Then I noticed that the system drive mirror (first two drives) looked
to have quite a bit of disk activity, but I couldn't find what was
causing it.  Now I realize that it wasn't "quite a bit"- I think it
was disk I/O grinding to a near halt.

I shut down the system, and after power-up and booting, the RAID10
array is rebuilding at full speed...

I'm happy to provide additional info and try things for interested
parties to troubleshoot or fix this problem.  Being able to hot-swap
drives is pretty critical here.

Brett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/