lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Fri, 11 Nov 2022 10:21:28 +0100
From:   "Arnd Bergmann" <arnd@...db.de>
To:     "Dmitry Torokhov" <dmitry.torokhov@...il.com>
Cc:     "Naresh Kamboju" <naresh.kamboju@...aro.org>,
        linux-stable <stable@...r.kernel.org>,
        "open list" <linux-kernel@...r.kernel.org>,
        "Linux ARM" <linux-arm-kernel@...ts.infradead.org>,
        lkft-triage@...ts.linaro.org,
        "Greg Kroah-Hartman" <gregkh@...uxfoundation.org>,
        "Sasha Levin" <sashal@...nel.org>,
        "Linus Walleij" <linus.walleij@...aro.org>,
        "Mark Brown" <broonie@...nel.org>,
        "Liam Girdwood" <lgirdwood@...il.com>
Subject: Re: arm: TI BeagleBoard X15 : Unable to handle kernel NULL pointer dereference
 at virtual address 00000369 - Internal error: Oops: 5 [#1] SMP ARM

On Fri, Nov 11, 2022, at 01:48, Dmitry Torokhov wrote:
> On Wed, Nov 9, 2022 at 2:20 PM Arnd Bergmann <arnd@...db.de> wrote:
>>
>> On Wed, Nov 9, 2022, at 13:57, Arnd Bergmann wrote:
>> >
>> > One thing that sticks out is the print_constraints_debug() function
>> > in the regulator framework, which uses a larger-than-average stack
>> > to hold a string buffer, and then calls into the low-level
>> > driver to get the actual data (regulator_get_voltage_rdev,
>> > _regulator_is_enabled). Splitting the device access out into a
>> > different function from the string handling might reduce the
>> > stack usage enough to stay just under the 8KB limit, though it's
>> > probably not a complete fix. I added the regulator maintainers
>> > to Cc for thoughts on this.
>>
>> I checked the stack usage for each of the 147 functions in the
>> backtrace, and as I was guessing print_constraints_debug() is
>> the largest, but it's still only 168 bytes, and everything else
>> is smaller, so no point hacking this.
>
> You mentioned that we are doing probing of a device 6 levels deep.
> Could one of the parent devices be marked for an asynchronous probe
> thus breaking the chain?

Ah right, I forgot that we already have a per-driver flag for this,
thanks a lot for the suggestion!

This means it might be as easy as this oneliner, picking
one of the drivers in the middle of the call chain that is
not shared across too many other systems:

diff --git a/drivers/mfd/palmas.c b/drivers/mfd/palmas.c
index 8b7429bd2e3e..f4a96eb98eea 100644
--- a/drivers/mfd/palmas.c
+++ b/drivers/mfd/palmas.c
@@ -731,6 +731,7 @@ static struct i2c_driver palmas_i2c_driver = {
        .driver = {
                   .name = "palmas",
                   .of_match_table = of_palmas_match_tbl,
+                  .probe_type = PROBE_PREFER_ASYNCHRONOUS,
        },
        .probe = palmas_i2c_probe,
        .remove = palmas_i2c_remove,

There is still a small regression risk for other OMAP platforms
that may rely on probe ordering, but it should reliably fix
the issue.

There is a related idea that I'll try to take another look
at: since the bug only happens sometimes, and not at all on
mainline kernels with IRQ_STACK, I had the idea of making the
kernel stack size runtime configurable on mainline kernels, by
reserving a fixed amount of the 8KB total. This should make it
possible to narrow down the actual maximum stack usage before
a guaranteed crash, and then validate that a fix correctly
addresses it.

    Arnd

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ