linux-kernel - Re: Areca hardware RAID / first-ever SCSI bus reset: am I about to lose this disk controller?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <506A0BFB.60606@pierre-beck.de>
Date:	Mon, 01 Oct 2012 23:33:15 +0200
From:	Pierre Beck <mail@...rre-beck.de>
To:	Nix <nix@...eri.org.uk>
CC:	Chris Murphy <lists@...orremedies.com>,
	Linux RAID <linux-raid@...r.kernel.org>,
	linux-kernel@...r.kernel.org
Subject: Re: Areca hardware RAID / first-ever SCSI bus reset: am I about to
 lose this disk controller?

Check the SMART values of the disks if possible. Watch for command 
timeouts and the usual bad sector stuff. I've had similar issues with 
Adaptec controllers. Bad disks seem to cause havoc. The outstanding 
operation isn't answered within [SCSI Timeout, default 30, 
/sys/block/sdX/device/timeout] seconds, so Linux performs a loop reset, 
eventually resetting the controller. That means between 60 and 120 
seconds of zero I/O operation, varying between controllers and disk 
array sizes. It's particularly annoying when in RAID and the disk 
could've simply been kicked within few seconds. Something that needs 
improvement IMHO.

On 23.09.2012 17:42, Nix wrote:
> On 19 Sep 2012, Chris Murphy outgrape:
>
>> On Sep 19, 2012, at 12:52 PM, Nix wrote:
>>
>>> So I have this x86-64 server running Linux 3.5.1 with a SATA-on-PCIe
>>> Areca 1210 hardware RAID-5 controller
>> Did you find this? Same controller family. Weird that this just shows
>> up now, but perhaps instead of it being "bad hardware" out the gate,
>> something's happened to it and now it's failing as you suspect.
> Hm, it's possible I suppose. Just as possible that a disk is dying.
>
>
> It looks to have been a one-off transient -- no recurrence yet, touch
> wood :)
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/