lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 3 Mar 2021 15:42:39 +0000
From:   <Don.Brace@...rochip.com>
To:     <slyich@...il.com>, <glaubitz@...sik.fu-berlin.de>
CC:     <linux-ia64@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <jszczype@...hat.com>, <Scott.Benesh@...rochip.com>,
        <Scott.Teel@...rochip.com>, <thenzl@...hat.com>,
        <martin.petersen@...cle.com>
Subject: RE: [bisected] 5.12-rc1 hpsa regression: "scsi: hpsa: Correct dev
 cmds outstanding for retried cmds" breaks hpsa P600

-----Original Message-----
From: Sergei Trofimovich [mailto:slyich@...il.com] 
Sent: Tuesday, March 2, 2021 6:23 PM
To: John Paul Adrian Glaubitz <glaubitz@...sik.fu-berlin.de>; Don Brace - C33706 <Don.Brace@...rochip.com>
Cc: linux-ia64@...r.kernel.org; linux-kernel@...r.kernel.org; Joe Szczypek <jszczype@...hat.com>; Scott Benesh - C33703 <Scott.Benesh@...rochip.com>; Scott Teel - C33730 <Scott.Teel@...rochip.com>; Tomas Henzl <thenzl@...hat.com>; Martin K. Petersen <martin.petersen@...cle.com>
Subject: [bisected] 5.12-rc1 hpsa regression: "scsi: hpsa: Correct dev cmds outstanding for retried cmds" breaks hpsa P600

EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe

On Tue, 2 Mar 2021 23:31:32 +0100
John Paul Adrian Glaubitz <glaubitz@...sik.fu-berlin.de> wrote:

> Hi Sergei!
>
> On 3/2/21 11:26 PM, Sergei Trofimovich wrote:
> > Gave v5.12-rc1 a try today and got a similar boot failure around 
> > hpsa queue initialization, but my failure is later:
> >     https://dev.gentoo.org/~slyfox/configs/guppy-dmesg-5.12-rc1
> > Maybe I get different error because I flipped on most debugging 
> > kernel options :)
> >
> > Looks like 'ERROR: Invalid distance value range' while being very 
> > scary are harmless. It's just a new spammy way for kernel to report 
> > lack of NUMA config on the machine (no SRAT and SLIT ACPI tables).
> >
> > At least I get hpsa detected on PCI bus. But I guess it's discovered 
> > configuration is very wrong as I get unaligned accesses:
> >     [   19.811570] kernel unaligned access to 0xe000000105dd8295, ip=0xa000000100b874d1
> >
> > Bisecting now.
>
> Sounds good. I guess we should get Jens' fix for the signal regression 
> merged as well as your two fixes for strace.

"bisected" (cheated halfway through) and verified that reverting
f749d8b7a9896bc6e5ffe104cc64345037e0b152 makes rx3600 boot again.

CCing authors who might be able to help us here.

commit f749d8b7a9896bc6e5ffe104cc64345037e0b152
Author: Don Brace <don.brace@...rochip.com>
Date:   Mon Feb 15 16:26:57 2021 -0600

    scsi: hpsa: Correct dev cmds outstanding for retried cmds

    Prevent incrementing device->commands_outstanding for ioaccel command
    retries that are driver initiated.  If the command goes through the retry
    path, the device->commands_outstanding counter has already accounted for
    the number of commands outstanding to the device.  Only commands going
    through function hpsa_cmd_resolve_events decrement this counter.

     - ioaccel commands go to either HBA disks or to logical volumes comprised
       of SSDs.

    The extra increment is causing device resets to hang.

     - Resets wait for all device outstanding commands to complete before
       returning.

    Replace unused field abort_pending with retry_pending. This is a
    maintenance driver so these changes have the least impact/risk.

    Link: https://lore.kernel.org/r/161342801747.29388.13045495968308188518.stgit@brunhilda
    Tested-by: Joe Szczypek <jszczype@...hat.com>
    Reviewed-by: Scott Benesh <scott.benesh@...rochip.com>
    Reviewed-by: Scott Teel <scott.teel@...rochip.com>
    Reviewed-by: Tomas Henzl <thenzl@...hat.com>
    Signed-off-by: Don Brace <don.brace@...rochip.com>
    Signed-off-by: Martin K. Petersen <martin.petersen@...cle.com>

Don, do you happen to know why this patch caused some controller init failure for device
    14:01.0 RAID bus controller: Hewlett-Packard Company Smart Array P600 ?

Boot failure: https://dev.gentoo.org/~slyfox/configs/guppy-dmesg-5.12-rc1
Boot success: https://dev.gentoo.org/~slyfox/configs/guppy-dmesg-5.12-rc1-good

The difference between the two boots is
f749d8b7a9896bc6e5ffe104cc64345037e0b152 reverted on top of 5.12-rc1 in -good case.

Looks like hpsa controller fails to initialize in bad case (could be a race?).

--

  Sergei

Don:
I see aligned access. Let me run pahole to see if anything jumps out.
What controller are you using?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ