lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51F708A4.9090207@interlog.com>
Date:	Mon, 29 Jul 2013 20:28:20 -0400
From:	Douglas Gilbert <dgilbert@...erlog.com>
To:	Nix <nix@...eri.org.uk>
CC:	Bernd Schubert <bernd.schubert@...tmail.fm>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-scsi@...r.kernel.org,
	"Martin K. Petersen" <martin.petersen@...cle.com>,
	nick.cheng@...ca.com.tw, stable@...r.kernel.org
Subject: Re: [SCSI REGRESSION] 3.10.2 or 3.10.3: arcmsr failure at bootup
 / early userspace transition

On 13-07-29 05:09 PM, Nix wrote:
> On 29 Jul 2013, Bernd Schubert uttered the following:
>
>> On 07/29/2013 03:05 PM, Nix wrote:
>>> On 29 Jul 2013, Bernd Schubert said:
>>>
>>>> Hi Nick,
>>>>
>>>> On 07/29/2013 12:10 PM, Nick Alcock wrote:
>>>>> arcmsr0: abort device command of scsi id = 0 lun = 1
>>>>> arcmsr0: abort device command of scsi id = 0 lun = 0
>>>>> arcmsr: executing bus reset eh.....num_resets=0, num_[...]
>>>>>
>>>>> arcmsr0: wait 'abort all outstanding command' timeout
>>>>> arcmsr0: executing hw bus reset ....
>>>>> arcmsr0: waiting for hw bus reset return, retry=0
>>>>> arcmsr0: waiting for hw bus reset return, retry=1
>>>>> Areca RAID Controller0: F/W V1.46 2009-01-06 & Model ARC-1210
>>>>> arcmsr: scsi  bus reset eh returns with success
>>>>> [and back to the top of the error messages again, apparently forever,
>>>>>     not that the machine would be much use without its RAID array even
>>>>>     if this loop terminated at some point, so I only gave it a couple
>>>>>     of minutes]
>>>>>
>>>>> The failure happens precisely at the moment we transition to early
>>>>> userspace, so presumably userspace I/O is failing (or something related
>>>>> to raw device access, perhaps, since the first thing it does is a
>>>>> vgscan).
>>>>>
>>>>> I haven't bisected yet (sorry, I have work to do which means this
>>>>> machine must be running right now), but nothing has changed in the
>>>>> arcmsr controller, nor in SCSI-land excepting
>>>>>
>>>>> commit 98dcc2946adbe4349ef1ef9b99873b912831edd4
>>>>> Author: Martin K. Petersen <martin.petersen@...cle.com>
>>>>> Date:   Thu Jun 6 22:15:55 2013 -0400
>
> I can now confirm that reverting this commit causes this problem to go
> away, and my machine boots fine again.
>
> Please revert (and figure out what is wrong so that 3.11 doesn't
> implode in the same way? I'm happy to assist...)

Hi,
Please supply the information that Martin Petersen asked
for.

I just examined a more recent Areca SAS RAID controller
and would describe it as the SCSI device from hell. One solution
to this problem is to modify the arcmsr driver so it returns
a more consistent set of lies to the management SCSI commands that
Martin is asking about.

Doug Gilbert

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ