lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a3b0e765-24e5-781b-218d-abcb16030e99@gmail.com>
Date:   Thu, 23 Aug 2018 11:22:58 -0500
From:   Corey Minyard <tcminyard@...il.com>
To:     Andrew Banman <abanman@....com>, Corey Minyard <minyard@....org>
Cc:     Arnd Bergmann <arnd@...db.de>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        justin.ernst@....com, rja@....com, frank.ramsay@....com,
        openipmi-developer@...ts.sourceforge.net,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC] IPMI state machine regression

On 08/22/2018 11:23 AM, Andrew Banman wrote:
> On Wed, Aug 22, 2018 at 11:14:52AM -0500, Corey Minyard wrote:
>> On 08/21/2018 05:14 PM, Andrew Banman wrote:
>>> Dear IPMI supporters,
>>>
>>> We observe a window in IPMI BT's opportunistic get capabilities request,
>>> wherein GET_DEVICE_GUID and GET_DEVICE_ID requests may start while the BT state
>>> machine is in WR_CONSUME. Following this, the 0xD5 error code is forced in
>>> bt_start_transaction, IPMI fails to initialize, and the interface is torn down.
>>> There is no mechanism to retry bringing up the interface in open() /dev/ipmi.
>>> This leaves IPMI hosed until you reload modules. Looks to happen after we call
>>> schedule().
>> When was the latest kernel where this worked properly?  Also, what hardware
>> is this?
> This is UV4.
>
> First known bad commit, but I am not sure if the timing issue predates
> it:
>
> commit aa9c9ab2443e3b9562c6c7cfc245a9e43b557d14
> Author: Jeremy Kerr <jk@...abs.org>
> Date:   Fri Aug 25 15:47:24 2017 +0800
>
>      ipmi: allow dynamic BMC version information
>
> Hits less frequently with older kernels so I didn't see it until
> recently when it became more frequent.

Ok, that's for the crash, which makes sense.  But that's an easy problem 
to fix.
I would like a "Tested-by" on that, if you get to test it, though I was 
able to
simulate various failures there to test it out.

So reading between the lines ("more frequent") I'm guessing that this still
happened with older kernels, but is becoming annoying with newer kernels.
I would guess recent changes causes it to happen more often due to changes
in the way the upper layer interacts with the lower layers, you will 
have more
messages at startup, and the timing is somewhat different.

The BT code itself hasn't changed much in over 10 years.  Nothing that
looks like it would cause an issue like this.  So I would guess this is an
issue that has been around for a while.

I don't have any real hardware with a BT interface, just the one in qemu,
but I've never seen it there.

It actually looks like the state machine is working ok.  But the BMC is
responding to a "Get Device ID" command with:

    Recv::  1c 08 d5


That's an error response with D5, which is "Cannot execute command.
Command, or request parameter(s), not supported in present state."
That's an error response from your BMC.  That particular command
shouldn't ever respond with that error, so I think the bug here is
with your BMC.

-corey


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ