lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <479de772-7f12-46ff-a3ad-cde9e125bc12@de.bosch.com>
Date: Mon, 6 May 2024 08:04:59 +0200
From: Dirk Behme <dirk.behme@...bosch.com>
To: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
CC: <linux-kernel@...r.kernel.org>, Rafael J Wysocki <rafael@...nel.org>,
	<syzbot+ffa8143439596313a85a@...kaller.appspotmail.com>, Eugeniu Rosca
	<eugeniu.rosca@...ch.com>
Subject: Re: [PATCH] drivers: core: Make dev->driver usage safe in
 dev_uevent()

On 30.04.2024 10:57, Greg Kroah-Hartman wrote:
> On Tue, Apr 30, 2024 at 10:50:52AM +0200, Dirk Behme wrote:
>> On 30.04.2024 10:41, Greg Kroah-Hartman wrote:
>>> On Tue, Apr 30, 2024 at 10:23:36AM +0200, Dirk Behme wrote:
>>>> Hi Greg,
>>>>
>>>> On 30.04.2024 09:20, Greg Kroah-Hartman wrote:
>>>>> On Tue, Apr 30, 2024 at 06:55:31AM +0200, Dirk Behme wrote:
>>>>>> Inspired by the function dev_driver_string() in the same file make sure
>>>>>> in case of uninitialization dev->driver is used safely in dev_uevent(),
>>>>>> as well.
>>>>>
>>>>> I think you are racing and just getting "lucky" with your change here,
>>>>> just like dev_driver_string() is doing there (that READ_ONCE() really
>>>>> isn't doing much to protect you...)
>>>>>
>>>>>> This change is based on the observation of the following race condition:
>>>>>>
>>>>>> Thread #1:
>>>>>> ==========
>>>>>>
>>>>>> really_probe() {
>>>>>> ...
>>>>>> probe_failed:
>>>>>> ...
>>>>>> device_unbind_cleanup(dev) {
>>>>>>          ...
>>>>>>          dev->driver = NULL;   // <= Failed probe sets dev->driver to NULL
>>>>>>          ...
>>>>>>          }
>>>>>> ...
>>>>>> }
>>>>>>
>>>>>> Thread #2:
>>>>>> ==========
>>>>>>
>>>>>> dev_uevent() {
>>>>>
>>>>> Wait, how can dev_uevent() be called if probe fails?  Who is calling
>>>>> that?
>>>>>
>>>>>> ...
>>>>>> if (dev->driver)
>>>>>>          // If dev->driver is NULLed from really_probe() from here on,
>>>>>>          // after above check, the system crashes
>>>>>>          add_uevent_var(env, "DRIVER=%s", dev->driver->name);
>>>>>>
>>>>>> dev_driver_string() can't be used here because we want NULL and not
>>>>>> anything else in the non-init case.
>>>>>>
>>>>>> Similar cases are reported by syzkaller in
>>>>>>
>>>>>> https://syzkaller.appspot.com/bug?extid=ffa8143439596313a85a
>>>>>>
>>>>>> But these are regarding the *initialization* of dev->driver
>>>>>>
>>>>>> dev->driver = drv;
>>>>>>
>>>>>> As this switches dev->driver to non-NULL these reports can be considered
>>>>>> to be false-positives (which should be "fixed" by this commit, as well,
>>>>>> though).
>>>>>>
>>>>>> Fixes: 239378f16aa1 ("Driver core: add uevent vars for devices of a class")
>>>>>> Cc: syzbot+ffa8143439596313a85a@...kaller.appspotmail.com
>>>>>> Reviewed-by: Eugeniu Rosca <eugeniu.rosca@...ch.com>
>>>>>> Tested-by: Eugeniu Rosca <eugeniu.rosca@...ch.com>
>>>>>> Signed-off-by: Dirk Behme <dirk.behme@...bosch.com>
>>>>>> ---
>>>>>>     drivers/base/core.c | 9 +++++++--
>>>>>>     1 file changed, 7 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>>>>>> index 5f4e03336e68..99ead727c08f 100644
>>>>>> --- a/drivers/base/core.c
>>>>>> +++ b/drivers/base/core.c
>>>>>> @@ -2639,6 +2639,7 @@ static const char *dev_uevent_name(const struct kobject *kobj)
>>>>>>     static int dev_uevent(const struct kobject *kobj, struct kobj_uevent_env *env)
>>>>>>     {
>>>>>>     	const struct device *dev = kobj_to_dev(kobj);
>>>>>> +	struct device_driver *drv;
>>>>>>     	int retval = 0;
>>>>>>     	/* add device node properties if present */
>>>>>> @@ -2667,8 +2668,12 @@ static int dev_uevent(const struct kobject *kobj, struct kobj_uevent_env *env)
>>>>>>     	if (dev->type && dev->type->name)
>>>>>>     		add_uevent_var(env, "DEVTYPE=%s", dev->type->name);
>>>>>> -	if (dev->driver)
>>>>>> -		add_uevent_var(env, "DRIVER=%s", dev->driver->name);
>>>>>> +	/* dev->driver can change to NULL underneath us because of unbinding
>>>>>> +	 * or failing probe(), so be careful about accessing it.
>>>>>> +	 */
>>>>>> +	drv = READ_ONCE(dev->driver);
>>>>>> +	if (drv)
>>>>>> +		add_uevent_var(env, "DRIVER=%s", drv->name);
>>>>>
>>>>> Again, you are just reducing the window here.  Maybe a bit, but not all
>>>>> that much overall as there is no real lock present.
>>>>>
>>>>> So how is this actually solving anything?
>>>>
>>>>
>>>> Looking at dev_driver_string() I hoped that it just reads *once*. I.e. we
>>>> don't care if we read NULL or any valid pointer, as long as this pointer
>>>> read is done only once ("atomically"?). If READ_ONCE() doesn't do that, I
>>>> agree, it's not the (race) fix we are looking for.
>>>
>>> Yes, what if you read it, and then it is unloaded from the system before
>>> you attempt to access drv->name?  not good.
>>>
>>>> Initially, I was thinking about anything like [1] below. I.e. adding a mutex
>>>> lock.  But not sure if that is better (acceptable?).
>>>
>>> a proper lock is the only way to correctly solve this.
>>
>>
>> Would using device_lock()/unlock() for locking like done below [1]
>> acceptable?
> 
> Why not test it out and see!  :)
We tested

diff --git a/drivers/base/core.c b/drivers/base/core.c
index b93f3c5716ae..e2a1dd015074 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2723,8 +2723,11 @@ static ssize_t uevent_show(struct device *dev, 
struct device_attribute *attr,
         if (!env)
                 return -ENOMEM;

+       /* Synchronize with really_probe() */
+       device_lock(dev);
         /* let the kset specific function add its keys */
         retval = kset->uevent_ops->uevent(&dev->kobj, env);
+       device_unlock(dev);
         if (retval)
                 goto out;


and it seems it fixes the issue at least for us in our tests.

It looks like really_probe() does run with device_lock() taken, already. 
So we don't need to care about that.

And depending on the call stack dev_uevent() itself is used with 
device_lock() taken sometimes, already, too. What means that it should 
be safe to call the whole function under that lock (but can't place the 
lock there).

So this patch goes one level up in the call stack of [1]:

  dev_uevent+0x235/0x380 drivers/base/core.c:2670
  uevent_show+0x10c/0x1f0 drivers/base/core.c:2742  <== lock here
  dev_attr_show+0x3a/0xa0 drivers/base/core.c:2445
  sysfs_kf_seq_show+0x17c/0x250 fs/sysfs/file.c:59
  kernfs_seq_show+0x7c/0x90 fs/kernfs/file.c:205
..

However, we can't prove that uevent_show() is never called with 
device_lock() held, already. And that uevent_ops->uevent() never calls 
anything what might break with device_lock() taken.

Best regards

Dirk

[1] https://syzkaller.appspot.com/bug?extid=ffa8143439596313a85a


P.S.: It looks like this issue was discussed back in 2015 already ;)

https://lore.kernel.org/lkml/1421259054-2574-1-git-send-email-a.sangwan@samsung.com/




Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ