linux-kernel - Possible race at user mode helper in request

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <69c9d897d4c1703fb5efe6d2d9dd9f61@codeaurora.org>
Date:	Mon, 14 Mar 2016 17:31:08 +0530
From:	clingutla@...eaurora.org
To:	ming.lei@...onical.com, linux-kernel@...r.kernel.org
Cc:	gregkh@...uxfoundation.org, linux-arm-msm@...r.kernel.org
Subject: Possible race at user mode helper in request_firmware

Hello,

I see possible race in _request_firmware_load function, on which I 
wanted to take your opinion.

When system is going to low power mode, device_cache_fw_images() is 
called from pm notifier which schedules async workers to cache devices 
firmware.
If more than 2 async requests falls under user helper mode, then there 
is a use after free of "firmware" directory kobject.
The race will hit, when one asynchronous worker calls device_del() while 
other asynchronous worker calls get_device_parent() in
_request_firmware_load().

After loading firmware, first worker calls device_del(), and "firmware" 
directory ref count is 1, so cleanup_glue_dir() calls kobject_put() with 
gdp_mutex lock held.
Meanwhile, second async worker calls device_add(), and 
get_device_parent() is still able to find the firmware directory kobject 
in ksets after getting  gdp_mutex.
This kobject gets added as parent to the second device kobject.
By the time of accessing the parent kobject, its ref count was 0 and 
prints below warning stack followed by crash.

[21147.455038] ------------[ cut here ]------------
[21147.458910] WARNING: at <...>/kernel/include/linux/kref.h:47 
kobject_get+0x50/0x68()
[21147.481199] Modules linked in: core_ctl(PO) qdrbg_module(O) 
qcrypto_module(O)
[21147.481256] CPU: 2 PID: 23935 Comm: kworker/u16:8 Tainted: P W O 
3.10.84-g0957845-00427-g56a05c2 #1
[21147.481284] Workqueue: events_unbound async_run_entry_fn
[21147.481296] Call trace:
[21147.481315] [<ffffffc000206b60>] dump_backtrace+0x0/0x270
[21147.481331] [<ffffffc000206de0>] show_stack+0x10/0x1c
[21147.481353] [<ffffffc000d569ec>] dump_stack+0x1c/0x28
[21147.481370] [<ffffffc00021cc18>] warn_slowpath_common+0x74/0x9c
[21147.481383] [<ffffffc00021ce4c>] warn_slowpath_null+0x14/0x20
[21147.481397] [<ffffffc000456b10>] kobject_get+0x4c/0x68
[21147.481416] [<ffffffc000609f64>] get_device_parent+0xa0/0x194
[21147.481430] [<ffffffc00060a3f4>] device_add+0x100/0x600
[21147.481446] [<ffffffc00061c390>] _request_firmware+0x8b4/0xa80
[21147.481459] [<ffffffc00061c58c>] request_firmware+0x30/0x3c
[21147.481473] [<ffffffc00061c5ec>] cache_firmware+0x54/0xb0
[21147.481490] [<ffffffc00061c65c>] __async_dev_cache_fw_image+0x14/0x54
[21147.481505] [<ffffffc000243794>] async_run_entry_fn+0x6c/0x12c
[21147.481521] [<ffffffc000237e88>] process_one_work+0x264/0x3dc
[21147.481535] [<ffffffc0002392c0>] worker_thread+0x1f0/0x340
[21147.481553] [<ffffffc00023e500>] kthread+0xac/0xb8
[21147.481563] ---[ end trace dabc98ea48b8ba59 ]---

[21147.486436] Unable to handle kernel paging request at virtual address 
0x6b6b6b6b6b6b6bcb
[21147.493816] pgd = ffffffc0a982d000
[21147.498337] [6b6b6b6b6b6b6bcb] *pgd=0000000000000000
[21147.502287] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[21147.507826] Modules linked in: core_ctl(PO) qdrbg_module(O) 
qcrypto_module(O)
[21147.514951] CPU: 2 PID: 23935 Comm: kworker/u16:8 Tainted: P W O 
3.10.84-g0957845-00427-g56a05c2 #1
[21147.524686] Workqueue: events_unbound async_run_entry_fn
[21147.529965] task: ffffffc008138000 ti: ffffffc007880000 task.ti: 
ffffffc007880000
[21147.537438] PC is at sysfs_create_dir+0x34/0xd4
[21147.541949] LR is at kobject_add_internal+0x120/0x24c
[21147.546979] pc : [<ffffffc000361fa8>] lr : [<ffffffc000456f5c>] 
pstate: 80000145
[21147.996295] Process kworker/u16:8 (pid: 23935, stack limit = 
0xffffffc007880058)
[21148.003673] Call trace:
[21148.006117] [<ffffffc000361fa8>] sysfs_create_dir+0x34/0xd4
[21148.011663] [<ffffffc000456f58>] kobject_add_internal+0x11c/0x24c
[21148.017737] [<ffffffc0004573ec>] kobject_add+0xc8/0xe4
[21148.022863] [<ffffffc00060a410>] device_add+0x11c/0x600
[21148.028069] [<ffffffc00061c390>] _request_firmware+0x8b4/0xa80
[21148.033886] [<ffffffc00061c58c>] request_firmware+0x30/0x3c
[21148.039439] [<ffffffc00061c5ec>] cache_firmware+0x54/0xb0
[21148.044822] [<ffffffc00061c65c>] __async_dev_cache_fw_image+0x14/0x54
[21148.051249] [<ffffffc000243794>] async_run_entry_fn+0x6c/0x12c
[21148.057064] [<ffffffc000237e88>] process_one_work+0x264/0x3dc
[21148.062792] [<ffffffc0002392c0>] worker_thread+0x1f0/0x340
[21148.068263] [<ffffffc00023e500>] kthread+0xac/0xb8
[21148.073039] Code: b5000094 14000026 d000ac54 9125a294 (7940c281)
[21148.080312] ---[ end trace dabc98ea48b8ba5a ]---

Below is the callstack where the parent is freed from the slub_debug 
info.
0xFFFFFFC000608D74  \\vmlinux\base/core\class_dir_release+0x0C
0xFFFFFFC0002FE6BC \\vmlinux\slub\free_debug_processing\fail+0x114
0xFFFFFFC0002FE7A0 \\vmlinux\slub\__slab_free+0x44
0xFFFFFFC0002FEE60 \\vmlinux\slub\kfree+0x1F8
0xFFFFFFC000608D70 \\vmlinux\base/core\class_dir_release+0x8
0xFFFFFFC000456D70 \\vmlinux\kobject\kobject_release+0x134
0xFFFFFFC000456E10 \\vmlinux\kobject\kobject_put+0x58
0xFFFFFFC000456C28 \\vmlinux\kobject\kobject_del+0x64
0xFFFFFFC000609D78 \\vmlinux\base/core\device_del+0x150
0xFFFFFFC00061C524 \\vmlinux\firmware_class\_request_firmware\out+0x71C
0xFFFFFFC00061C58C \\vmlinux\firmware_class\request_firmware+0x30
0xFFFFFFC00061C5EC \\vmlinux\firmware_class\cache_firmware+0x54
0xFFFFFFC00061C65C 
\\vmlinux\firmware_class\__async_dev_cache_fw_image+0x14
0xFFFFFFC000243794 \\vmlinux\async\async_run_entry_fn+0x6C
0xFFFFFFC000237E88 \\vmlinux\workqueue\process_one_work+0x264
0xFFFFFFC0002392C0 \\vmlinux\workqueue\worker_thread\recheck+0x1A0
0xFFFFFFC00023E500 \\vmlinux\kthread\kthread+0xAC

For similar type of issue, I see there is an existing fix: "sysfs: 
driver core: Fix glue dir race condition by gdp_mutex".
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/drivers/base/core.c?id=e4a60d139060975eb956717e4f63ae348d4d8cc5


Still I am able to reproduce the issue. I have verified this race on 
kernels: 3.10, 3.18 and 4.4.

I followed below procedure to reproduce the issue:
      1. Enable "CONFIG_DEBUG_KOBJECT_RELEASE"
      2. Use test_firmware, modified the test for async calls
      3. Replace WARN_ON with BUG_ON in kref_get().
I ran below script from shell:
     count=0
     while [ 1 ]
     do
             echo 3 > /sys/devices/virtual/misc/test_firmware/trigger_ufw
             if [ $? -ne 0 ]; then
                     print "Exiting.. "
                     exit 1
             fi
             count=$(($count +1))
             echo "count $count"
      done

With below patch, I could reproduce the issue in the 2nd iteration.

-------------------------8<----------------------------------------------

To reproduce the race condition, use firmware test and
CONFIG_DEBUG_KOBJECT_RELEASE=y, which delays kobject
releases. So that we can catch the bug easily.

Enabled below flags in kernel config file:

+CONFIG_DEBUG_KOBJECT_RELEASE=y
+CONFIG_TEST_FIRMWARE=y


diff --git a/include/linux/kref.h b/include/linux/kref.h
index 484604d..727fb24 100644
--- a/include/linux/kref.h
+++ b/include/linux/kref.h
@@ -44,7 +44,7 @@ static inline void kref_get(struct kref *kref)
       * condition when this kref is freeing by some other thread right
now.
       * In this case one should use kref_get_unless_zero()
       */
-    WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2);
+    BUG_ON(atomic_inc_return(&kref->refcount) < 2);
  }

  /**
diff --git a/lib/test_firmware.c b/lib/test_firmware.c
old mode 100644
new mode 100755
index 86374c1..14c9598
--- a/lib/test_firmware.c
+++ b/lib/test_firmware.c
@@ -18,6 +18,7 @@
  #include <linux/miscdevice.h>
  #include <linux/slab.h>
  #include <linux/uaccess.h>
+#include <linux/async.h>

+
+static ASYNC_DOMAIN_EXCLUSIVE(ufw_domain);
+
+static void __async_ufw_work(void *name,
+                                       async_cookie_t cookie)
+{
+    const struct firmware *fw;
+    const char *fw_name = name;
+    int ret;
+
+    pr_err("requesting %s\n", fw_name);
+
+    ret = request_firmware(&fw, fw_name, NULL);
+    pr_err("loaded: %zu\n", fw ? fw->size : 0);
+        if (!ret)
+                kfree(fw);
+
+}
+
+static char *name[] = { "fw1.bin", "fw2.bin"};
+
+static ssize_t trigger_ufw_store(struct device *dev,
+        struct device_attribute *attr,
+        const char *buf, size_t count)
+{
+    int rc, i, c=0;
+    u32 start;
+    char *fw_name;
+
+    rc = kstrtou32(buf, 0, &start);
+    if (rc){
+        pr_err("Invalid option\n");
+        return rc;
+    }
+    pr_err(" no of iterations %d\n", start);
+
+    for (i=0; i< start; i++)
+    {
+        if (c >= 2)
+            c = 0;
+        fw_name = name[c++];
+        async_schedule_domain(__async_ufw_work, (void *)fw_name, 
&ufw_domain);
+    }
+    async_synchronize_full_domain(&ufw_domain);
+    return count;
+}

  static DEVICE_ATTR_WO(trigger_request);
+static DEVICE_ATTR_WO(trigger_ufw);

  static int __init test_firmware_init(void)
  {
@@ -92,6 +140,12 @@ static int __init test_firmware_init(void)
          goto dereg;
      }

+    rc = device_create_file(test_fw_misc_device.this_device,
+                             &dev_attr_trigger_ufw);
+    if (rc) {
+        pr_err("could not create sysfs interface: %d\n", rc);
+        goto dereg;
+    }
      pr_warn("interface ready\n");

      return 0;
@@ -106,7 +160,9 @@ static void __exit test_firmware_exit(void)
  {
      release_firmware(test_firmware);
      device_remove_file(test_fw_misc_device.this_device,
-               &dev_attr_trigger_request);
+            &dev_attr_trigger_request);
+    device_remove_file(test_fw_misc_device.this_device,
+            &dev_attr_trigger_ufw);
      misc_deregister(&test_fw_misc_device);
      pr_warn("removed interface\n");
  }

----------------------------8<--------------------------------------------

Thanks and regards,
Chandrasekhar L.
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of Code Aurora Forum,
  a Linux Foundation Collaborative Project.