lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0jAxJH=w+rAmQ2VMj5o3rKcS65DNGumLLerFJz+eKniyg@mail.gmail.com>
Date: Tue, 13 Jan 2026 14:43:11 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Danilo Krummrich <dakr@...nel.org>, Gui-Dong Han <hanguidong02@...il.com>
Cc: gregkh@...uxfoundation.org, rafael@...nel.org, 
	linux-kernel@...r.kernel.org, baijiaju1990@...il.com, 
	Qiu-ji Chen <chenqiuji666@...il.com>, 2045gemini@...il.com
Subject: Re: [PATCH v3] driver core: fix use-after-free of driver_override via driver_match_device()

On Tue, Jan 13, 2026 at 2:34 PM Danilo Krummrich <dakr@...nel.org> wrote:
>
> On Tue Jan 13, 2026 at 1:42 PM CET, Gui-Dong Han wrote:
> > On Tue, Jan 13, 2026 at 5:55 PM Danilo Krummrich <dakr@...nel.org> wrote:
> >>
> >> On Thu Nov 27, 2025 at 3:57 PM CET, Gui-Dong Han wrote:
> >> > diff --git a/drivers/base/base.h b/drivers/base/base.h
> >> > index 86fa7fbb3548..72791125de91 100644
> >> > --- a/drivers/base/base.h
> >> > +++ b/drivers/base/base.h
> >> > @@ -166,6 +166,9 @@ void device_set_deferred_probe_reason(const struct device *dev, struct va_format
> >> >  static inline int driver_match_device(const struct device_driver *drv,
> >> >                                     struct device *dev)
> >> >  {
> >> > +     /* Protects against driver_set_override() races */
> >> > +     device_lock_assert(dev);
> >> > +
> >> >       return drv->bus->match ? drv->bus->match(dev, drv) : 1;
> >> >  }
> >>
> >> I am not convinced that this is the correct fix, since
> >>
> >>   1. Not all match() callbacks access the driver_override field,
> >>
> >>   2. driver_override is accessed in other places as well,
> >>
> >>   3. driver_override is a bus device specific field (with a common
> >>      helper admittedly).
> >>
> >> I think it would be better to make driver_override a field in the base
> >> struct device. This way we can not only provide driver_set_override(), but also
> >> driver_get_override(), which should contain the device_lock_assert() instead.
> >
> > Hi Danilo,
> >
> > Thanks for the feedback. While I agree that moving driver_override to
> > struct device is a good idea for future refactoring, I believe this
> > patch is necessary as the immediate fix due to existing locking
> > constraints.
> >
> > The main constraint is the device_lock. Currently,
> > driver_match_device() is called from three paths. One of them
> > (__device_attach_driver) already holds device_lock to protect the
> > whole binding process. We cannot add locking inside the bus match()
> > callbacks (or after refactoring, acquiring the lock to call a
> > hypothetical getter inside match) because it would deadlock this
> > existing path. Since we cannot remove the lock from
> > __device_attach_driver, the only safe solution is to ensure the lock
> > is held at the other two call sites. Moving the field to struct device
> > would not change this locking asymmetry. The device_lock is a coarse
> > lock intended for correctness here, and contention is low.
>
> Don't get me wrong, I'm not against providing the guarantee that the match()
> callbacks are always called with the device lock held, I think we should do
> that.
>
> But I do not agree that this is *properly* fixing locking around the
> driver_override field by itself.
>
> > I understand you might be concerned about performance. I tested this
> > using ftrace. driver_match_device() is a cold path (called ~600 times
> > during boot in my setup). I observed no measurable boot time
> > regression. After boot, it is rarely called (e.g., module loading). So
> > I think the impact is negligible.
>
> I'm not concerned about performance at all. I'm concerned about locking code
> paths instead of locking data.
>
> > Regarding other access sites: driver_override is specifically designed
> > for matching. Apart from sysfs (which I am fixing separately [1]), I
> > have not found other places that require locking. Other accesses occur
> > mostly during init/cleanup (e.g., pci_device_probe/pci_release_dev)
> > where concurrency is not an issue. If there are other specific race
> > conditions, we can fix them individually as they are not universal. In
> > contrast, the match() path is the critical concurrent path where the
> > feature is actually implemented, affecting over 10 buses.
>
> Again, the problem is not that match() must be called with the device lock held
> because of driver_set_override().
>
> The problem is that currently match() is only *sometimes* called with the device
> lock held, i.e. it would also work if match() is never called with the device
> lock held.
>
> So, there are two separate issues: fix the inconsistent locking guarantee of
> match() callbacks and locking design of driver_override.
>
> Regarding the latter, it is odd that the driver_override field belongs to the
> bus device structure, which ultimately makes the bus responsible to ensure
> mutual exclusion, while the driver-core helper driver_set_override() assumes
> that the field is protected with the device lock.
>
> So, either we leave it to the bus entirely, then the driver_set_override()
> helper should take a lock pointer provided by the bus or we define that
> driver_override is protected with the device lock, but then we should move it to
> struct device and provide an accessor with proper documentation of the locking
> rules.
>
> Having that said, I think this patch should focus on fixing the inconsistent
> locking locking guarantee of the match() callback. But it should not justify
> that driver_match_device() is called with the device lock held with "Protects
> against driver_set_override() races".

I agree here.

The inconsistent locking is a problem and the resultant race
conditions are a symptom of it.

> > Even if we refactor (e.g., adding dev_access_driver_override), we
> > would still need to hold the lock at these call sites for the reasons
> > above.
>
> Sure, but then we have a lockdep assert and the rules are properly documented.

Right.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ