lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 21 Aug 2018 17:14:43 -0500
From:   Andrew Banman <abanman@....com>
To:     Corey Minyard <minyard@....org>, Arnd Bergmann <arnd@...db.de>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc:     abanman@....com, justin.ernst@....com, rja@....com,
        frank.ramsay@....com, openipmi-developer@...ts.sourceforge.net,
        linux-kernel@...r.kernel.org
Subject: [RFC] IPMI state machine regression

Dear IPMI supporters,

We observe a window in IPMI BT's opportunistic get capabilities request,
wherein GET_DEVICE_GUID and GET_DEVICE_ID requests may start while the BT state
machine is in WR_CONSUME. Following this, the 0xD5 error code is forced in
bt_start_transaction, IPMI fails to initialize, and the interface is torn down.
There is no mechanism to retry bringing up the interface in open() /dev/ipmi.
This leaves IPMI hosed until you reload modules. Looks to happen after we call
schedule().

The problem is intermittent, but we observe ~50% fail rate on 4.18
kernels on our UV4 systems.

Here's the relevant snippet (the action starts at 120.367379):

[  118.316764] IPMI System Interface driver.
[  118.354246] ipmi_si dmi-ipmi-si.0: ipmi_platform: probing via SMBIOS
[  118.354249] ipmi_si: SMBIOS: mem 0xce4 regsize 1 spacing 1 irq 6
[  118.354251] ipmi_si: Adding SMBIOS-specified bt state machine
[  118.354311] ipmi_si IPI0001:00: ipmi_platform: probing via ACPI
[  118.381164] ipmi_si IPI0001:00: [io  0x0ce4-0x0ce6] regsize 1 spacing 1 irq 6
[  118.381167] ipmi_si: Adding ACPI-specified bt state machine
[  118.382811] ipmi_si: Trying SMBIOS-specified bt state machine at mem address 0xce4, slave address 0x20, irq 6
[  118.382815] ipmi_si dmi-ipmi-si.0: Could not set up I/O space
[  118.382816] ipmi_si: Trying ACPI-specified bt state machine at i/o address 0xce4, slave address 0x0, irq 6
[  118.382911] IPMI BT: flag reset [ ]
[  118.383008] BT: XACTION [ ] TO=5000000 - 0
[  118.383059] BT: WR_BYTES [ ] TO=5000000 - 0
[  118.383272] BT: WR_CONSUME [ H2B ] TO=5000000 - 0
[  118.388034] BT: WR_CONSUME [ ] TO=5000000 - 4000
[  118.388059] BT: RD_WAIT [ ] TO=4996000 - 0
[  118.396017] BT: RD_WAIT [ ] TO=4996000 - 4000
[  118.492015] BT: RD_WAIT [ ] TO=4992000 - 4000
[  118.520073] BT: RD_WAIT [ ] TO=4988000 - 4000
[  118.528079] BT: RD_WAIT [ ] TO=4984000 - 4000
[  118.540049] BT: RD_WAIT [ ] TO=4980000 - 4000
[  118.548011] BT: RD_WAIT [ ] TO=4976000 - 4000
[  118.564033] BT: RD_WAIT [ ] TO=4972000 - 4000
[  118.572109] BT: RD_WAIT [ ] TO=4968000 - 4000
[  118.580013] BT: RD_WAIT [ ] TO=4964000 - 4000
[  118.588017] BT: RD_WAIT [ ] TO=4960000 - 4000
[  118.608014] BT: RD_WAIT [ ] TO=4956000 - 4000
[  118.628058] BT: RD_WAIT [ ] TO=4952000 - 4000
[  118.636053] BT: RD_WAIT [ ] TO=4948000 - 4000
[  118.648016] BT: RD_WAIT [ ] TO=4944000 - 4000
[  118.672028] BT: RD_WAIT [ ] TO=4940000 - 4000
[  118.684056] BT: RD_WAIT [ ] TO=4936000 - 4000
[  118.696021] BT: RD_WAIT [ ] TO=4932000 - 4000
[  118.708015] BT: RD_WAIT [ ] TO=4928000 - 4000
[  118.720018] BT: RD_WAIT [ ] TO=4924000 - 4000
[  118.732063] BT: RD_WAIT [ ] TO=4920000 - 4000
[  118.744021] BT: RD_WAIT [ ] TO=4916000 - 4000
[  118.752089] BT: RD_WAIT [ ] TO=4912000 - 4000
[  118.764011] BT: RD_WAIT [ ] TO=4908000 - 4000
[  118.784009] BT: RD_WAIT [ ] TO=4904000 - 4000
[  118.808011] BT: RD_WAIT [ ] TO=4900000 - 4000
[  118.832010] BT: RD_WAIT [ ] TO=4896000 - 4000
[  118.844011] BT: RD_WAIT [ ] TO=4892000 - 4000
[  118.860036] BT: RD_WAIT [ ] TO=4888000 - 4000
[  118.868102] BT: RD_WAIT [ ] TO=4884000 - 4000
[  118.876012] BT: RD_WAIT [ ] TO=4880000 - 4000
[  118.904011] BT: RD_WAIT [ ] TO=4876000 - 4000
[  118.924013] BT: RD_WAIT [ ] TO=4872000 - 4000
[  118.948016] BT: RD_WAIT [ ] TO=4868000 - 4000
[  118.984043] BT: RD_WAIT [ ] TO=4864000 - 4000
[  118.992059] BT: RD_WAIT [ ] TO=4860000 - 4000
[  118.995422]  for 2000000
[  118.995650]
[  119.000026] BT: RD_WAIT [ ] TO=4856000 - 4000
[  119.012018] BT: RD_WAIT [ ] TO=4852000 - 4000
[  119.032010] BT: RD_WAIT [ ] TO=4848000 - 4000
[  119.048013] BT: RD_WAIT [ ] TO=4844000 - 4000
[  119.060013] BT: RD_WAIT [ ] TO=4840000 - 4000
[  119.080014] BT: RD_WAIT [ ] TO=4836000 - 4000
[  119.096021] BT: RD_WAIT [ ] TO=4832000 - 4000
[  119.120009] BT: RD_WAIT [ ] TO=4828000 - 4000
[  119.128017] BT: RD_WAIT [ ] TO=4824000 - 4000
[  119.136011] BT: RD_WAIT [ ] TO=4820000 - 4000
[  119.144013] BT: RD_WAIT [ ] TO=4816000 - 4000
[  119.164008] BT: RD_WAIT [ ] TO=4812000 - 4000
[  119.184010] BT: RD_WAIT [ ] TO=4808000 - 4000
[  119.204015] BT: RD_WAIT [ ] TO=4804000 - 4000
[  119.264034] BT: RD_WAIT [ ] TO=4800000 - 4000
[  119.272013] BT: RD_WAIT [ ] TO=4796000 - 4000
[  119.288013] BT: RD_WAIT [ ] TO=4792000 - 4000
[  119.308011] BT: RD_WAIT [ ] TO=4788000 - 4000
[  119.324011] BT: RD_WAIT [ ] TO=4784000 - 4000
[  119.344010] BT: RD_WAIT [ ] TO=4780000 - 4000
[  119.364027] BT: RD_WAIT [ ] TO=4776000 - 4000
[  119.376011] BT: RD_WAIT [ ] TO=4772000 - 4000
[  119.396010] BT: RD_WAIT [ ] TO=4768000 - 4000
[  119.408013] BT: RD_WAIT [ ] TO=4764000 - 4000
[  119.424012] BT: RD_WAIT [ B2H ] TO=4760000 - 4000
[  119.434845] BT: CLEAR_B2H [ H_BUSY ] TO=4756000 - 0
[  119.592204] BT: RD_BYTES [ H_BUSY ] TO=4756000 - 0
[  119.650481] BT: XACTION [ ] TO=5000000 - 0
[  119.676062] BT: WR_BYTES [ ] TO=5000000 - 0
[  119.734596] BT: WR_CONSUME [ H2B ] TO=5000000 - 0
[  119.740019] BT: WR_CONSUME [ ] TO=5000000 - 4000
[  119.740036] BT: RD_WAIT [ ] TO=4996000 - 0
[  119.748032] BT: RD_WAIT [ B2H ] TO=4996000 - 4000
[  119.748101] BT: CLEAR_B2H [ H_BUSY ] TO=4992000 - 0
[  119.748124] BT: RD_BYTES [ H_BUSY ] TO=4992000 - 0
[  119.748348] BT: XACTION [ ] TO=5000000 - 0
[  119.748382] BT: WR_BYTES [ ] TO=5000000 - 0
[  119.748537] BT: WR_CONSUME [ H2B ] TO=5000000 - 0
[  119.760013] BT: WR_CONSUME [ B2H ] TO=5000000 - 4000
[  119.806271] BT: RD_WAIT [ B2H ] TO=4996000 - 0
[  119.806282] BT: CLEAR_B2H [ H_BUSY ] TO=4996000 - 0
[  119.816909] BT: RD_BYTES [ H_BUSY ] TO=4996000 - 0
[  119.816944] BT: XACTION [ ] TO=5000000 - 0
[  119.827061] BT: WR_BYTES [ ] TO=5000000 - 0
[  119.827088] BT: WR_CONSUME [ H2B ] TO=5000000 - 0
[  119.852024] BT: WR_CONSUME [ B2H ] TO=5000000 - 4000
[  119.875752] BT: RD_WAIT [ B2H ] TO=4996000 - 0
[  119.895099] BT: CLEAR_B2H [ H_BUSY ] TO=4996000 - 0
[  119.991768] BT: RD_BYTES [ H_BUSY ] TO=4996000 - 0
[  119.991805] BT: XACTION [ ] TO=5000000 - 0
[  120.004993] BT: WR_BYTES [ ] TO=5000000 - 0
[  120.027171] BT: WR_CONSUME [ H2B ] TO=5000000 - 0
[  120.048024] BT: WR_CONSUME [ ] TO=5000000 - 4000
[  120.072949] BT: RD_WAIT [ B2H ] TO=4996000 - 0
[  120.085892] BT: CLEAR_B2H [ H_BUSY ] TO=4996000 - 0
[  120.097374] BT: RD_BYTES [ H_BUSY ] TO=4996000 - 0
[  120.097406] BT: XACTION [ ] TO=5000000 - 0
[  120.125568] BT: WR_BYTES [ ] TO=5000000 - 0
[  120.141827] BT: WR_CONSUME [ H2B ] TO=5000000 - 0
[  120.264026] BT: WR_CONSUME [ B2H ] TO=5000000 - 4000
[  120.287267] BT: RD_WAIT [ B2H ] TO=4996000 - 0
[  120.287276] BT: CLEAR_B2H [ H_BUSY ] TO=4996000 - 0
[  120.287280] BT: RD_BYTES [ H_BUSY ] TO=4996000 - 0
[  120.287429] ipmi_si IPI0001:00: Using irq 6
[  120.287447] **Interrupt: 1534434704.287445059
[  120.287457] BT: XACTION [ ] TO=5000000 - 0
[  120.287466] BT: WR_BYTES [ ] TO=5000000 - 0
[  120.287500] BT: WR_CONSUME [ H2B ] TO=5000000 - 0
[  120.297445] **Interrupt: 1534434704.297445691
[  120.332534] BT: WR_CONSUME [ B2H ] TO=5000000 - 0
[  120.332538] BT: RD_WAIT [ B2H ] TO=5000000 - 0
[  120.342956] BT: CLEAR_B2H [ H_BUSY ] TO=5000000 - 0
[  120.342969] BT: RD_BYTES [ H_BUSY ] TO=5000000 - 0
[  120.343244] **Done: 1534434704.343244675
[  120.367379] BT: IDLE [ ] TO=5000000 - 0
[  120.367384] BT: CAP_BEGIN [ ] TO=5000000 - 0
[  120.376634] **Timer: 1534434704.376633453
[  120.376646] BT: XACTION [ ] TO=5000000 - 92000
[  120.376683] BT: WR_BYTES [ ] TO=4908000 - 0
[  120.376995] BT: WR_CONSUME [ H2B ] TO=4908000 - 0
[  120.377025] Send:  18 08
[  120.377027] **Enqueue: 1534434704.377028726
[  120.377028] **Start2: 1534434704.377029859
[  120.377061] BT: WR_CONSUME [ ] TO=4908000 - 0
[  120.377137] BT: RD_WAIT [ ] TO=4908000 - 0
[  120.377140] Recv::  1c 08 d5
[  120.377156] Send:  18 01
[  120.377161] **Enqueue: 1534434704.377161047
[  120.377164] **Start2: 1534434704.377164751
[  120.377226] BT: RD_WAIT [ ] TO=4908000 - 0
[  120.377237] Recv::  1c 01 d5
[  120.377242] ipmi_si IPI0001:00: IPMI message handler: device id demangle failed: -22
[  120.377252] ipmi_si IPI0001:00: Unable to get the device id: -5
[  120.377252] ipmi_si IPI0001:00: Unable to register device: error -5
[  120.569865] IPMI SSIF Interface driver

Also FYI,

Bad cleanup code causes a null pointer and adds a race condition; this
is addressed by "[PATCH] Remove redundant cleanup in ipmi_register_smi"
from Justin Ernst <justin.ernst@....com>, which I paste here:

diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c
index 51832b8a2c62..3b0b50c4f064 100644
--- a/drivers/char/ipmi/ipmi_msghandler.c
+++ b/drivers/char/ipmi/ipmi_msghandler.c
@@ -3395,12 +3395,12 @@ int ipmi_register_smi(const struct ipmi_smi_handlers *handlers,
 
  out:
 	if (rv) {
-		ipmi_bmc_unregister(intf);
-		list_del_rcu(&intf->link);
+		/*
+		 * ipmi_unregister_smi must be called to clean up after
+		 * failure. We unlock the mutex to allow ipmi_unregister_smi
+		 * to lock it and perform cleanup.
+		 */
 		mutex_unlock(&ipmi_interfaces_mutex);
-		synchronize_srcu(&ipmi_interfaces_srcu);
-		cleanup_srcu_struct(&intf->users_srcu);
-		kref_put(&intf->refcount, intf_free);
 	} else {
 		/*
 		 * Keep memory order straight for RCU readers.  Make
--

Powered by blists - more mailing lists