linux-kernel - Re: Possible race in dev_coredumpm()-del

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <57a04278-0a60-cc7d-7ce8-a75c2befd568@quicinc.com>
Date:   Wed, 13 Apr 2022 16:51:18 +0530
From:   Mukesh Ojha <quic_mojha@...cinc.com>
To:     Greg KH <gregkh@...uxfoundation.org>
CC:     <linux-kernel@...r.kernel.org>, <tglx@...utronix.de>,
        <sboyd@...nel.org>, <johannes@...solutions.net>,
        <rafael@...nel.org>
Subject: Re: Possible race in dev_coredumpm()-del_timer() path



On 4/13/2022 4:28 PM, Greg KH wrote:
> On Wed, Apr 13, 2022 at 03:46:39PM +0530, Mukesh Ojha wrote:
>> On Wed, Apr 13, 2022 at 07:34:24AM +0200, Greg KH wrote:
>>> On Wed, Apr 13, 2022 at 10:59:22AM +0530, Mukesh Ojha wrote:
>>>> Hi All,
>>>>
>>>> We are hitting one race due to which try_to_grab_pending() is stuck .
>>>
>>> What kernel version are you using?
>>
>> 5.10
> 
> 5.10.0 was released a very long time ago.  Please use a more modern
> kernel release :)
> 
>> Sorry, for the formatting mess.
>>
>>>> In following scenario, while running (p1)dev_coredumpm() devcd device is
>>>> added to
>>>> the framework and uevent notification sent to userspace that result in the
>>>> call to (p2) devcd_data_write()
>>>> which eventually try to delete the queued timer which in the racy scenario
>>>> timer is not queued yet.
>>>> So, debug object report some warning and in the meantime timer is
>>>> initialized and queued from p1 path.
>>>> and from p2 path it gets overriden again timer->entry.pprev=NULL and
>>>> try_to_grab_pending() stuck
>> 	p1 					p2(X)
>>
>>     dev_coredump() uevent sent to userspace
>>     device_add()  =========================> userspace process X reads the uevents
>>                                              writes to devcd fd which
>>                                              results into writes to
>>
>>                                              devcd_data_write()
>> 					      mod_delayed_work()
>>                                                  try_to_grab_pending()
>> 						  del_timer()
>> 						   debug_assert_init()
>>    INIT_DELAYED_WORK
>>    schedule_delayed_work
>> 						    debug_object_fixup()
> 
> Why do you have object debugging enabled?  That's going to take a LONG
> time, and will find bugs in your code.  Perhaps like this one?
> There is no issue if we disable debug object.
Here, some client module try to collect dump
via dev_coredumpm() which creates devcdX device and
expects userspace to read this data. Here, it might be
exposing a synchronization issue between dev_coredumpm()
and  devcd_data_write() perhaps, a mutex ??

================o<===============================

  11
  12 diff --git a/drivers/base/devcoredump.c b/drivers/base/devcoredump.c
  13 index 9243468..a620dcb 100644
  14 --- a/drivers/base/devcoredump.c
  15 +++ b/drivers/base/devcoredump.c
  16 @@ -29,6 +29,7 @@ struct devcd_entry {
  17         struct device devcd_dev;
  18         void *data;
  19         size_t datalen;
  20 +       struct mutex mutex;
  21         struct module *owner;
  22         ssize_t (*read)(char *buffer, loff_t offset, size_t count,
  23                         void *data, size_t datalen);
  24 @@ -88,7 +89,9 @@ static ssize_t devcd_data_write(struct file 
*filp, struct kobject *kobj,
  25         struct device *dev = kobj_to_dev(kobj);
  26         struct devcd_entry *devcd = dev_to_devcd(dev);
  27
  28 +       mutex_lock(&devcd->mutex);
  29         mod_delayed_work(system_wq, &devcd->del_wk, 0);
  30 +       mutex_unlock(&devcd->mutex);
  31
  32         return count;
  33  }
  34 @@ -282,13 +285,14 @@ void dev_coredumpm(struct device *dev, struct 
module *owner,
  35         devcd->read = read;
  36         devcd->free = free;
  37         devcd->failing_dev = get_device(dev);
  38 -
  39 +       mutex_init(&devcd->mutex);
  40         device_initialize(&devcd->devcd_dev);
  41
  42         dev_set_name(&devcd->devcd_dev, "devcd%d",
  43                      atomic_inc_return(&devcd_count));
  44         devcd->devcd_dev.class = &devcd_class;
  45
  46 +       mutex_lock(&devcd->mutex);
  47         if (device_add(&devcd->devcd_dev))
  48                 goto put_device;
  49
  50 @@ -302,10 +306,11 @@ void dev_coredumpm(struct device *dev, struct 
module *owner,
  51
  52         INIT_DELAYED_WORK(&devcd->del_wk, devcd_del);
  53         schedule_delayed_work(&devcd->del_wk, DEVCD_TIMEOUT);
  54 -
  55 +       mutex_unlock(&devcd->mutex);


Thanks,
-Mukesh

> What type of device is this?  What bus?  What driver?
> 
> And if you turn object debugging off, what happens?
> 
> thanks,
> 
> greg k-h