lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 7 Sep 2017 14:16:19 +0200
From:   Maxime Ripard <maxime.ripard@...e-electrons.com>
To:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>
Cc:     linux-kernel@...r.kernel.org,
        Thomas Petazzoni <thomas.petazzoni@...e-electrons.com>,
        Quentin Schulz <quentin.schulz@...e-electrons.com>,
        Mylene Josserand <mylene.josserand@...e-electrons.com>
Subject: mutex_lock issues during poweroff

Hi,

We've been investigating a bug on our kernel for the last couple
monthes.

The scenario is this: we have an ARM board that embed an Allwinner A33
SoC. That board is using a PMIC connected to the SoC through a
proprietary bus, whose driver is in drivers/bus/sunxi-rsb.c. The
poweroff is implemented by sending a shutdown command to that PMIC.

http://elixir.free-electrons.com/linux/v4.9.47/source/drivers/mfd/axp20x.c#L743

That PMIC also serves other purposes, such as controlling the
regulators, but we also use it to get the various power supplies
state, and report them through our power supplies driver.

http://elixir.free-electrons.com/linux/v4.9.47/source/drivers/power/supply/axp20x_usb_power.c
http://elixir.free-electrons.com/linux/v4.12.11/source/drivers/power/supply/axp20x_ac_power.c
http://elixir.free-electrons.com/linux/v4.12.11/source/drivers/power/supply/axp20x_battery.c

The bug arises when we have those drivers enabled on a kernel 4.9.47
(or any 4.9 kernel. 4.8 also happens to show this). In some cases (1
out of 200-300 poweroff), the board will not poweroff. After digging
through this, it turns out that in such scenario, the mutex_lock we
have in the bus driver never returns.

Here: http://elixir.free-electrons.com/linux/v4.9.47/source/drivers/bus/sunxi-rsb.c#L379

Which means that we will never actually send the command, which also
explains why it powered on.

This gets weirder, since if we dump the return code of mutex_is_locked
right before a failing case, the mutex isn't already locked, so we
should not block or sleep at all.

If we disable the power supplies driver that poll the PMIC status on a
regular basis, it works, however we've never actually seen a
concurrent usage of that bus. In our practical cases, the mutex is
always unlocked.

If we remove the mutex_lock / _unlock entirely, we don't stall anymore
either, which seems to confirm something weird going on here.

One thing worth noting is that we couldn't reproduce the issue with a
4.13. We can't bisect really easily due to the amount of patches that
we still have on 4.9 and have all been merged since, but it seems like
the bug was fixed (either on purpose or as a side effect), and was
never sent to stable. Looking at the history of kernel/locking/mutex.c
during that window didn't really show anything obvious though.

If you have any ideas or spot something very wrong, I'd be happy to
hear about. Thanks!

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

Download attachment "signature.asc" of type "application/pgp-signature" (802 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ