linux-kernel - Re: Possible race at user mode helper in request

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACVXFVNT=7qLUq1GdrYuMmtmkKxF+W_MZUB8kjzaMAotp=Wu=Q@mail.gmail.com>
Date:	Wed, 16 Mar 2016 16:42:11 +0800
From:	Ming Lei <ming.lei@...onical.com>
To:	Chandra Sekhar Lingutla <clingutla@...eaurora.org>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	linux-arm-msm@...r.kernel.org,
	Russell King <linux@....linux.org.uk>
Subject: Re: Possible race at user mode helper in request_firmware

On Wed, Mar 16, 2016 at 3:02 PM, Chandra Sekhar Lingutla
<clingutla@...eaurora.org> wrote:
> Hi Ming,
>
>
> On 03/16/2016 07:36 AM, Ming Lei wrote:
>>
>> On Tue, Mar 15, 2016 at 11:09 PM, Ming Lei <ming.lei@...onical.com> wrote:
>>>
>>> Hi,
>>>
>>> On Mon, Mar 14, 2016 at 8:00 PM,  <clingutla@...eaurora.org> wrote:
>>>>
>>>> Hello,
>>>>
>>>> I see possible race in _request_firmware_load function, on which I
>>>> wanted to
>>>> take your opinion.
>>>>
>>>> When system is going to low power mode, device_cache_fw_images() is
>>>> called
>>>> from pm notifier which schedules asynchronous workers to cache devices
>>>> firmware.
>>>> If more than 2 async requests falls under user helper mode, then there
>>>> is a
>>>> use after free of "firmware" directory kobject.
>>>
>>>
>>> If two requests are for getting same firmware image, only one can be sent
>>> out and be completed with user helper, please see
>>> fw_lookup_and_allocate_buf(),
>>> so the race shouldn't have been triggered in this case.
>>>
>>> So I guess your test must be involved with at least two different
>>> requests.
>>>
>>>>
>>>> The race will hit, when one asynchronous worker calls device_del() while
>>>> other asynchronous worker calls get_device_parent() in
>>>> _request_firmware_load().
>>>> After loading firmware, first worker calls device_del(), and "firmware"
>>>> directory ref count is 1, so cleanup_glue_dir() calls kobject_put() with
>>>> gdp_mutex lock held.
>>>> Meanwhile, second async worker calls device_add(), and
>>>> get_device_parent()
>>>> is still able to find the firmware directory kobject in ksets after
>>>> getting
>>>> gdp_mutex.
>>>> This kobject gets added as parent to the second device kobject.
>>>> By the time of accessing the parent kobject, its ref count was 0 and
>>>> prints
>>>> below warning stack followed by crash.
>>>
>>>
>>> The trick here is just that we don't use 'struct device' in
>>> device_cache_fw_images(),
>>> then the firmware device's parent is null during calling device_add()
>>> from
>>> _request_firmware_load(), but this should be allowed by driver core.
>>>
>>>  From firmware class view, both device_add()/device_del() is run with
>>> pair,
>>> and driver core still complains during get_device_parent(), so something
>>> inside driver core should be be wrong, IMO.
>>
>>
>> Thinking about the issue further, I believe it is a false positive which
>> is
>> caused by CONFIG_DEBUG_KOBJECT_RELEASE, follows the story:
>>
>> 1) device_add() path
>>
>> get_device_parent()
>>                 ......
>>                 mutex_lock(&gdp_mutex);
>>
>>                  /* find our class-directory at the parent and reference
>> it */
>>                  spin_lock(&dev->class->p->glue_dirs.list_lock);
>>                  list_for_each_entry(k, &dev->class->p->glue_dirs.list,
>> entry)
>>                          if (k->parent == parent_kobj) {
>>                                  kobj = kobject_get(k);
>>                                  break;
>>                          }
>>                  spin_unlock(&dev->class->p->glue_dirs.list_lock);
>>                  if (kobj) {
>>                          mutex_unlock(&gdp_mutex);
>>                          return kobj;
>>                  }
>>                  k = class_dir_create_and_add(dev->class, parent_kobj);
>>                  mutex_unlock(&gdp_mutex);
>>                  ......
>>
>> 2) device_del() path
>>
>> cleanup_device_parent()
>>         ->cleanup_glue_dir()
>>              ->mutex_lock(&gdp_mutex);
>>              ->kobject_put(glue_dir);
>>                     ->kobject_cleanup()
>>                           ->kobject_del()
>>                                  ->kobj_kset_leave()
>>              ->mutex_unlock(&gdp_mutex);
>>
>> So gdp_mutex is held wrt. both looking up/adding the parent kobject
>> and removing the parent kobject. Actually with this lock, both glur_dir
>> kobject's referece counting and joining/leaving kset are run atomically.
>>
>> The race you found is triggered by CONFIG_DEBUG_KOBJECT_RELEASE,
>> which causes kobject_cleanup() to be run in workqueue context and the
>> lock of gdp_mutex can't be hold in that situation anymore. That is the
>> root
>> cause for your race.
>>
>> That also said DEBUG_KOBJECT_RELEASE might cause trouble in
>> some situations.
>>
>> Thanks,
>> Ming
>>
> I agree that, CONFIG_DEBUG_KOBJECT_RELEASE can cause the race, as it delays
> call to kobject_cleanup() which deletes kobj entry in glue_dir list.
>
> I see the race even after disabling CONFIG_DEBUG_KOBJECT_RELEASE.
>
> The race is between kobject_put() and kobject_get() for the "firmware"
> directory.
>
> In  device_del() path, I see the race after the call of
> cleanup_device_parent().
> cleanup_device_parent() calls kobject_put(glue_dir) with gdp_mutex lock
> holding,
> So ideally, to call kobject_put/kobject_get(glue_dir) we need to acquire
> gdp_mutex.
>
> but kobject_del(kobj) will not hold the gdp_mutex lock while calling
> kobject_put(kobj->parent).
>
> Here is the race:
>
> Consider 2 async works calls request_firmware() for 2 different binaries
> (fw1.bin, fw2.bin)
> Async work1 got scheduled first, and it falls in user helper mode so it
> creates "firmware" dir
> and then creates "fw1.bin" directory to load data from user.
> After load completion or timeout, it starts cleanup dir's, meantime async
> work2 got scheduled and
> it also falls in user helper mode, and story starts here:
>
> PATH1   req fw for "fw1.bin"                                         PATH2
>  req fw for "fw2.bin"
> _request_firmware_load()
> device_del(dev1)
>
>         ->cleanup_device_parent()
>
>                 mutex_lock(&gdp_mutex);
>                                          _request_firmware_load("fw2.bin")
>                 kobject_put(kobj->parent); --> firmware ref ->1
>                  ->device_add(dev2)
>                 mutex_unlock(&gdp_mutex);
> ->get_device_parent()
>         ->kobject_del()
> mutex_lock(&gdp_mutex)
>                 kobj_kset_leave(kobj); -->del "fw1.bin" kobj in kset
> list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry)
>                 kobject_put(kobj->parent); /* No lock :)*/
> if (k->parent == parent_kobj) {
>
>                         ->kref_put()    --> firmware ref ->0
> kobj = kobject_get(k); -> k is "firmware"
>  dir
>                                 ->kboject_release()
> break;
>
>                                         ->kobject_del()
> }
>
>                                 /*here deletes firmware entry in kset*/
> mutex_unlock(&gdp_mutex)
>
> ->dev2->kobj.parent = kobj    -> kobj is released already
>
>
>
> PATH 1 is started removing the kobject of "fw1.bin" and its parent
> "firmware"
> .
> cleanup_device_parent(dev1) makes firmware ref count to 1, by calling
> kobject_put(firmware) with gdp_mutex held.
>
> and kobject_del(dev1->kobj) will again calls kobject_put(kobj->parent)
> without holding gdp_mutex.
>
> In PATH2, glue_dir is able to find "firmware" kobj, as firmware dir kobj is
> deleted from the glue_dir list at kboject_release().
> and it calls kobject_get(firmware), by the time PATH1 calls
> kobject_release(firmware).
>
> So PATH2 got reference of freed firmware kobj and added as parent to
> "fw2.bin" kobj.
>
> I think, here is possible fix for this race:
>
> ------------------------8<-------------------------------------
> Signed-off-by: Lingutla Chandrasekhar <clingutla@...eaurora.org>
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 5d8b713..47bc082 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -1224,7 +1224,13 @@ void device_del(struct device *dev)
>                                              BUS_NOTIFY_REMOVED_DEVICE,
> dev);
>         kobject_uevent(&dev->kobj, KOBJ_REMOVE);
>         cleanup_device_parent(dev);
> -       kobject_del(&dev->kobj);
> +
> +       if (dev->kobj.parent->kset == &dev->class->p->glue_dirs) {
> +               mutex_lock(&gdp_mutex);
> +               kobject_del(&dev->kobj);
> +               mutex_unlock(&gdp_mutex);
> +       } else
> +               kobject_del(&dev->kobj);
>         put_device(parent);
>  }
>  EXPORT_SYMBOL_GPL(device_del);
>
> --------------------------->8-----------------------------------

OK, one simpler way is to move cleanup_device_parent(dev)
after kobject_del(&dev->kobj), which looks more clean too.

>
>
>>>
>>>>
>>>> [21147.455038] ------------[ cut here ]------------
>>>> [21147.458910] WARNING: at <...>/kernel/include/linux/kref.h:47
>>>> kobject_get+0x50/0x68()
>>>> [21147.481199] Modules linked in: core_ctl(PO) qdrbg_module(O)
>>>> qcrypto_module(O)
>>>> [21147.481256] CPU: 2 PID: 23935 Comm: kworker/u16:8 Tainted: P W O
>>>> 3.10.84-g0957845-00427-g56a05c2 #1
>>>> [21147.481284] Workqueue: events_unbound async_run_entry_fn
>>>> [21147.481296] Call trace:
>>>> [21147.481315] [<ffffffc000206b60>] dump_backtrace+0x0/0x270
>>>> [21147.481331] [<ffffffc000206de0>] show_stack+0x10/0x1c
>>>> [21147.481353] [<ffffffc000d569ec>] dump_stack+0x1c/0x28
>>>> [21147.481370] [<ffffffc00021cc18>] warn_slowpath_common+0x74/0x9c
>>>> [21147.481383] [<ffffffc00021ce4c>] warn_slowpath_null+0x14/0x20
>>>> [21147.481397] [<ffffffc000456b10>] kobject_get+0x4c/0x68
>>>> [21147.481416] [<ffffffc000609f64>] get_device_parent+0xa0/0x194
>>>> [21147.481430] [<ffffffc00060a3f4>] device_add+0x100/0x600
>>>> [21147.481446] [<ffffffc00061c390>] _request_firmware+0x8b4/0xa80
>>>> [21147.481459] [<ffffffc00061c58c>] request_firmware+0x30/0x3c
>>>> [21147.481473] [<ffffffc00061c5ec>] cache_firmware+0x54/0xb0
>>>> [21147.481490] [<ffffffc00061c65c>] __async_dev_cache_fw_image+0x14/0x54
>>>> [21147.481505] [<ffffffc000243794>] async_run_entry_fn+0x6c/0x12c
>>>> [21147.481521] [<ffffffc000237e88>] process_one_work+0x264/0x3dc
>>>> [21147.481535] [<ffffffc0002392c0>] worker_thread+0x1f0/0x340
>>>> [21147.481553] [<ffffffc00023e500>] kthread+0xac/0xb8
>>>> [21147.481563] ---[ end trace dabc98ea48b8ba59 ]---
>>>>
>>>> [21147.486436] Unable to handle kernel paging request at virtual address
>>>> 0x6b6b6b6b6b6b6bcb
>>>> [21147.493816] pgd = ffffffc0a982d000
>>>> [21147.498337] [6b6b6b6b6b6b6bcb] *pgd=0000000000000000
>>>> [21147.502287] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>>>> [21147.507826] Modules linked in: core_ctl(PO) qdrbg_module(O)
>>>> qcrypto_module(O)
>>>> [21147.514951] CPU: 2 PID: 23935 Comm: kworker/u16:8 Tainted: P W O
>>>> 3.10.84-g0957845-00427-g56a05c2 #1
>>>> [21147.524686] Workqueue: events_unbound async_run_entry_fn
>>>> [21147.529965] task: ffffffc008138000 ti: ffffffc007880000 task.ti:
>>>> ffffffc007880000
>>>> [21147.537438] PC is at sysfs_create_dir+0x34/0xd4
>>>> [21147.541949] LR is at kobject_add_internal+0x120/0x24c
>>>> [21147.546979] pc : [<ffffffc000361fa8>] lr : [<ffffffc000456f5c>]
>>>> pstate:
>>>> 80000145
>>>> [21147.996295] Process kworker/u16:8 (pid: 23935, stack limit =
>>>> 0xffffffc007880058)
>>>> [21148.003673] Call trace:
>>>> [21148.006117] [<ffffffc000361fa8>] sysfs_create_dir+0x34/0xd4
>>>> [21148.011663] [<ffffffc000456f58>] kobject_add_internal+0x11c/0x24c
>>>> [21148.017737] [<ffffffc0004573ec>] kobject_add+0xc8/0xe4
>>>> [21148.022863] [<ffffffc00060a410>] device_add+0x11c/0x600
>>>> [21148.028069] [<ffffffc00061c390>] _request_firmware+0x8b4/0xa80
>>>> [21148.033886] [<ffffffc00061c58c>] request_firmware+0x30/0x3c
>>>> [21148.039439] [<ffffffc00061c5ec>] cache_firmware+0x54/0xb0
>>>> [21148.044822] [<ffffffc00061c65c>] __async_dev_cache_fw_image+0x14/0x54
>>>> [21148.051249] [<ffffffc000243794>] async_run_entry_fn+0x6c/0x12c
>>>> [21148.057064] [<ffffffc000237e88>] process_one_work+0x264/0x3dc
>>>> [21148.062792] [<ffffffc0002392c0>] worker_thread+0x1f0/0x340
>>>> [21148.068263] [<ffffffc00023e500>] kthread+0xac/0xb8
>>>> [21148.073039] Code: b5000094 14000026 d000ac54 9125a294 (7940c281)
>>>> [21148.080312] ---[ end trace dabc98ea48b8ba5a ]---
>>>>
>>>> Below is the callstack where the parent is freed from the slub_debug
>>>> info.
>>>> 0xFFFFFFC000608D74  \\vmlinux\base/core\class_dir_release+0x0C
>>>> 0xFFFFFFC0002FE6BC \\vmlinux\slub\free_debug_processing\fail+0x114
>>>> 0xFFFFFFC0002FE7A0 \\vmlinux\slub\__slab_free+0x44
>>>> 0xFFFFFFC0002FEE60 \\vmlinux\slub\kfree+0x1F8
>>>> 0xFFFFFFC000608D70 \\vmlinux\base/core\class_dir_release+0x8
>>>> 0xFFFFFFC000456D70 \\vmlinux\kobject\kobject_release+0x134
>>>> 0xFFFFFFC000456E10 \\vmlinux\kobject\kobject_put+0x58
>>>> 0xFFFFFFC000456C28 \\vmlinux\kobject\kobject_del+0x64
>>>> 0xFFFFFFC000609D78 \\vmlinux\base/core\device_del+0x150
>>>> 0xFFFFFFC00061C524 \\vmlinux\firmware_class\_request_firmware\out+0x71C
>>>> 0xFFFFFFC00061C58C \\vmlinux\firmware_class\request_firmware+0x30
>>>> 0xFFFFFFC00061C5EC \\vmlinux\firmware_class\cache_firmware+0x54
>>>> 0xFFFFFFC00061C65C
>>>> \\vmlinux\firmware_class\__async_dev_cache_fw_image+0x14
>>>> 0xFFFFFFC000243794 \\vmlinux\async\async_run_entry_fn+0x6C
>>>> 0xFFFFFFC000237E88 \\vmlinux\workqueue\process_one_work+0x264
>>>> 0xFFFFFFC0002392C0 \\vmlinux\workqueue\worker_thread\recheck+0x1A0
>>>> 0xFFFFFFC00023E500 \\vmlinux\kthread\kthread+0xAC
>>>>
>>>> For similar type of issue, I see there is an existing fix: "sysfs:
>>>> driver
>>>> core: Fix glue dir race condition by gdp_mutex".
>>>>
>>>> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/drivers/base/core.c?id=e4a60d139060975eb956717e4f63ae348d4d8cc5
>>>>
>>>> Still I am able to reproduce the issue. I have verified this race on
>>>> kernels: 3.10, 3.18 and 4.4.
>>>>
>>>> I followed below procedure to reproduce the issue:
>>>>       1. Enable "CONFIG_DEBUG_KOBJECT_RELEASE"
>>>>       2. Use test_firmware, modified the test for async calls
>>>>       3. Replace WARN_ON with BUG_ON in kref_get().
>>>> I ran below script from shell:
>>>>      count=0
>>>>      while [ 1 ]
>>>>      do
>>>>              echo 3 >
>>>> /sys/devices/virtual/misc/test_firmware/trigger_ufw
>>>>              if [ $? -ne 0 ]; then
>>>>                      print "Exiting.. "
>>>>                      exit 1
>>>>              fi
>>>>              count=$(($count +1))
>>>>              echo "count $count"
>>>>       done
>>>>
>>>> With below patch, I could reproduce the issue in the 2nd iteration.
>>>
>>>
>>> I will run your test script to see if there is new findings.
>>>
>>>>
>>>>
>>>> -------------------------8<----------------------------------------------
>>>>
>>>> To reproduce the race condition, use firmware test and
>>>> CONFIG_DEBUG_KOBJECT_RELEASE=y, which delays kobject
>>>> releases. So that we can catch the bug easily.
>>>>
>>>> Enabled below flags in kernel config file:
>>>>
>>>> +CONFIG_DEBUG_KOBJECT_RELEASE=y
>>>> +CONFIG_TEST_FIRMWARE=y
>>>
>>>
>>> Looks I can't reproduce the issue with your approach.
>>>
>>>>
>>>>
>>>> diff --git a/include/linux/kref.h b/include/linux/kref.h
>>>> index 484604d..727fb24 100644
>>>> --- a/include/linux/kref.h
>>>> +++ b/include/linux/kref.h
>>>> @@ -44,7 +44,7 @@ static inline void kref_get(struct kref *kref)
>>>>        * condition when this kref is freeing by some other thread right
>>>> now.
>>>>        * In this case one should use kref_get_unless_zero()
>>>>        */
>>>> -    WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2);
>>>> +    BUG_ON(atomic_inc_return(&kref->refcount) < 2);
>>>>   }
>>>>
>>>>   /**
>>>> diff --git a/lib/test_firmware.c b/lib/test_firmware.c
>>>> old mode 100644
>>>> new mode 100755
>>>> index 86374c1..14c9598
>>>> --- a/lib/test_firmware.c
>>>> +++ b/lib/test_firmware.c
>>>> @@ -18,6 +18,7 @@
>>>>   #include <linux/miscdevice.h>
>>>>   #include <linux/slab.h>
>>>>   #include <linux/uaccess.h>
>>>> +#include <linux/async.h>
>>>>
>>>> +
>>>> +static ASYNC_DOMAIN_EXCLUSIVE(ufw_domain);
>>>> +
>>>> +static void __async_ufw_work(void *name,
>>>> +                                       async_cookie_t cookie)
>>>> +{
>>>> +    const struct firmware *fw;
>>>> +    const char *fw_name = name;
>>>> +    int ret;
>>>> +
>>>> +    pr_err("requesting %s\n", fw_name);
>>>> +
>>>> +    ret = request_firmware(&fw, fw_name, NULL);
>>>> +    pr_err("loaded: %zu\n", fw ? fw->size : 0);
>>>> +        if (!ret)
>>>> +                kfree(fw);
>>>> +
>>>> +}
>>>> +
>>>> +static char *name[] = { "fw1.bin", "fw2.bin"};
>>>> +
>>>> +static ssize_t trigger_ufw_store(struct device *dev,
>>>> +        struct device_attribute *attr,
>>>> +        const char *buf, size_t count)
>>>> +{
>>>> +    int rc, i, c=0;
>>>> +    u32 start;
>>>> +    char *fw_name;
>>>> +
>>>> +    rc = kstrtou32(buf, 0, &start);
>>>> +    if (rc){
>>>> +        pr_err("Invalid option\n");
>>>> +        return rc;
>>>> +    }
>>>> +    pr_err(" no of iterations %d\n", start);
>>>> +
>>>> +    for (i=0; i< start; i++)
>>>> +    {
>>>> +        if (c >= 2)
>>>> +            c = 0;
>>>> +        fw_name = name[c++];
>>>> +        async_schedule_domain(__async_ufw_work, (void *)fw_name,
>>>> &ufw_domain);
>>>> +    }
>>>> +    async_synchronize_full_domain(&ufw_domain);
>>>> +    return count;
>>>> +}
>>>>
>>>>   static DEVICE_ATTR_WO(trigger_request);
>>>> +static DEVICE_ATTR_WO(trigger_ufw);
>>>>
>>>>   static int __init test_firmware_init(void)
>>>>   {
>>>> @@ -92,6 +140,12 @@ static int __init test_firmware_init(void)
>>>>           goto dereg;
>>>>       }
>>>>
>>>> +    rc = device_create_file(test_fw_misc_device.this_device,
>>>> +                             &dev_attr_trigger_ufw);
>>>> +    if (rc) {
>>>> +        pr_err("could not create sysfs interface: %d\n", rc);
>>>> +        goto dereg;
>>>> +    }
>>>>       pr_warn("interface ready\n");
>>>>
>>>>       return 0;
>>>> @@ -106,7 +160,9 @@ static void __exit test_firmware_exit(void)
>>>>   {
>>>>       release_firmware(test_firmware);
>>>>       device_remove_file(test_fw_misc_device.this_device,
>>>> -               &dev_attr_trigger_request);
>>>> +            &dev_attr_trigger_request);
>>>> +    device_remove_file(test_fw_misc_device.this_device,
>>>> +            &dev_attr_trigger_ufw);
>>>>       misc_deregister(&test_fw_misc_device);
>>>>       pr_warn("removed interface\n");
>>>>   }
>>>>
>>>>
>>>> ----------------------------8<--------------------------------------------
>>>>
>>>> Thanks and regards,
>>>> Chandrasekhar L.
>>>> --
>>>> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
>>>> member of
>>>> Code Aurora Forum,
>>>>   a Linux Foundation Collaborative Project.