[<prev] [next>] [day] [month] [year] [list]
Message-ID: <1b409e15-7f9a-4e07-bacb-14f71a4bb671@gmail.com>
Date: Fri, 26 Apr 2024 17:53:13 +0200
From: Dirk Behme <dirk.behme@...il.com>
To: Linux kernel mailing list <linux-kernel@...r.kernel.org>
Cc: Dirk Behme <dirk.behme@...bosch.com>
Subject: data-race in dev_uevent / really_probe?
Hi,
debugging a NULL pointer crash on a quite old embedded system kernel
(4.14.x) we might have found the root cause for
https://syzkaller.appspot.com/bug?extid=ffa8143439596313a85a
https://groups.google.com/g/syzkaller-upstream-moderation/c/xTpwi0C6eSY/m/FqJAQtinAQAJ
Looking at the recent kernel, it looks like the relevant code hasn't
changed that much since then. So even in recent kernel code it looks
like there is a synchronization issue between dev_uevent() and
really_probe():
Thread #1:
========
really_probe() {
...
probe_failed:
...
device_unbind_cleanup(dev) {
...
dev->driver = NULL; // <= Failed probe sets dev->driver to NULL
...
}
..
}
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/dd.c#n552
Thread #2:
========
dev_uevent() {
..
if (dev->driver)
// If dev->driver is NULLed from really_probe() from
here on,
// after above check, the system crashes
add_uevent_var(env, "DRIVER=%s", dev->driver->name);
..
}
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/core.c#n2670
The setup is a device driver probe that fails. In our case the probe
from an I2C driver. While that failing probe does issue some
dev_info() and dev_err() output. What seems to trigger in our case
systemd-journal (as given in the groups.google.com link above) which
calls via the given call stack dev_uevent().
In the end, dev_uevent() has validated dev->driver successfully. But
if, depending on timing, exactly after this the failing
(really-)probe() NULLs dev->driver, the system crashes due to using
dev->driver being NULL then.
Does that make sense? Or have we missed anything?
Best regards
Dirk
Powered by blists - more mailing lists