[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <57a04278-0a60-cc7d-7ce8-a75c2befd568@quicinc.com>
Date: Wed, 13 Apr 2022 16:51:18 +0530
From: Mukesh Ojha <quic_mojha@...cinc.com>
To: Greg KH <gregkh@...uxfoundation.org>
CC: <linux-kernel@...r.kernel.org>, <tglx@...utronix.de>,
<sboyd@...nel.org>, <johannes@...solutions.net>,
<rafael@...nel.org>
Subject: Re: Possible race in dev_coredumpm()-del_timer() path
On 4/13/2022 4:28 PM, Greg KH wrote:
> On Wed, Apr 13, 2022 at 03:46:39PM +0530, Mukesh Ojha wrote:
>> On Wed, Apr 13, 2022 at 07:34:24AM +0200, Greg KH wrote:
>>> On Wed, Apr 13, 2022 at 10:59:22AM +0530, Mukesh Ojha wrote:
>>>> Hi All,
>>>>
>>>> We are hitting one race due to which try_to_grab_pending() is stuck .
>>>
>>> What kernel version are you using?
>>
>> 5.10
>
> 5.10.0 was released a very long time ago. Please use a more modern
> kernel release :)
>
>> Sorry, for the formatting mess.
>>
>>>> In following scenario, while running (p1)dev_coredumpm() devcd device is
>>>> added to
>>>> the framework and uevent notification sent to userspace that result in the
>>>> call to (p2) devcd_data_write()
>>>> which eventually try to delete the queued timer which in the racy scenario
>>>> timer is not queued yet.
>>>> So, debug object report some warning and in the meantime timer is
>>>> initialized and queued from p1 path.
>>>> and from p2 path it gets overriden again timer->entry.pprev=NULL and
>>>> try_to_grab_pending() stuck
>> p1 p2(X)
>>
>> dev_coredump() uevent sent to userspace
>> device_add() =========================> userspace process X reads the uevents
>> writes to devcd fd which
>> results into writes to
>>
>> devcd_data_write()
>> mod_delayed_work()
>> try_to_grab_pending()
>> del_timer()
>> debug_assert_init()
>> INIT_DELAYED_WORK
>> schedule_delayed_work
>> debug_object_fixup()
>
> Why do you have object debugging enabled? That's going to take a LONG
> time, and will find bugs in your code. Perhaps like this one?
> There is no issue if we disable debug object.
Here, some client module try to collect dump
via dev_coredumpm() which creates devcdX device and
expects userspace to read this data. Here, it might be
exposing a synchronization issue between dev_coredumpm()
and devcd_data_write() perhaps, a mutex ??
================o<===============================
11
12 diff --git a/drivers/base/devcoredump.c b/drivers/base/devcoredump.c
13 index 9243468..a620dcb 100644
14 --- a/drivers/base/devcoredump.c
15 +++ b/drivers/base/devcoredump.c
16 @@ -29,6 +29,7 @@ struct devcd_entry {
17 struct device devcd_dev;
18 void *data;
19 size_t datalen;
20 + struct mutex mutex;
21 struct module *owner;
22 ssize_t (*read)(char *buffer, loff_t offset, size_t count,
23 void *data, size_t datalen);
24 @@ -88,7 +89,9 @@ static ssize_t devcd_data_write(struct file
*filp, struct kobject *kobj,
25 struct device *dev = kobj_to_dev(kobj);
26 struct devcd_entry *devcd = dev_to_devcd(dev);
27
28 + mutex_lock(&devcd->mutex);
29 mod_delayed_work(system_wq, &devcd->del_wk, 0);
30 + mutex_unlock(&devcd->mutex);
31
32 return count;
33 }
34 @@ -282,13 +285,14 @@ void dev_coredumpm(struct device *dev, struct
module *owner,
35 devcd->read = read;
36 devcd->free = free;
37 devcd->failing_dev = get_device(dev);
38 -
39 + mutex_init(&devcd->mutex);
40 device_initialize(&devcd->devcd_dev);
41
42 dev_set_name(&devcd->devcd_dev, "devcd%d",
43 atomic_inc_return(&devcd_count));
44 devcd->devcd_dev.class = &devcd_class;
45
46 + mutex_lock(&devcd->mutex);
47 if (device_add(&devcd->devcd_dev))
48 goto put_device;
49
50 @@ -302,10 +306,11 @@ void dev_coredumpm(struct device *dev, struct
module *owner,
51
52 INIT_DELAYED_WORK(&devcd->del_wk, devcd_del);
53 schedule_delayed_work(&devcd->del_wk, DEVCD_TIMEOUT);
54 -
55 + mutex_unlock(&devcd->mutex);
Thanks,
-Mukesh
> What type of device is this? What bus? What driver?
>
> And if you turn object debugging off, what happens?
>
> thanks,
>
> greg k-h
Powered by blists - more mailing lists