linux-kernel - Re: Possible race in dev_coredumpm()-del

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b9c70a58-6d79-3976-c74e-91210cf162f5@quicinc.com>
Date:   Wed, 13 Apr 2022 19:47:02 +0530
From:   Mukesh Ojha <quic_mojha@...cinc.com>
To:     Greg KH <gregkh@...uxfoundation.org>
CC:     <linux-kernel@...r.kernel.org>, <tglx@...utronix.de>,
        <sboyd@...nel.org>, <johannes@...solutions.net>,
        <rafael@...nel.org>
Subject: Re: Possible race in dev_coredumpm()-del_timer() path



On 4/13/2022 4:28 PM, Greg KH wrote:
> On Wed, Apr 13, 2022 at 03:46:39PM +0530, Mukesh Ojha wrote:
>> On Wed, Apr 13, 2022 at 07:34:24AM +0200, Greg KH wrote:
>>> On Wed, Apr 13, 2022 at 10:59:22AM +0530, Mukesh Ojha wrote:
>>>> Hi All,
>>>>
>>>> We are hitting one race due to which try_to_grab_pending() is stuck .
>>>
>>> What kernel version are you using?
>>
>> 5.10
> 
> 5.10.0 was released a very long time ago.  Please use a more modern
> kernel release :)
> 

It would not be feasible for us to switch to latest kernel and I think, 
this issue could be there in recent kernel as well.

>> Sorry, for the formatting mess.
>>
>>>> In following scenario, while running (p1)dev_coredumpm() devcd device is
>>>> added to
>>>> the framework and uevent notification sent to userspace that result in the
>>>> call to (p2) devcd_data_write()
>>>> which eventually try to delete the queued timer which in the racy scenario
>>>> timer is not queued yet.
>>>> So, debug object report some warning and in the meantime timer is
>>>> initialized and queued from p1 path.
>>>> and from p2 path it gets overriden again timer->entry.pprev=NULL and
>>>> try_to_grab_pending() stuck
>> 	p1 					p2(X)
>>
>>     dev_coredump() uevent sent to userspace
>>     device_add()  =========================> userspace process X reads the uevents
>>                                              writes to devcd fd which
>>                                              results into writes to
>>
>>                                              devcd_data_write()
>> 					      mod_delayed_work()
>>                                                  try_to_grab_pending()
>> 						  del_timer()
>> 						   debug_assert_init()
>>    INIT_DELAYED_WORK
>>    schedule_delayed_work
>> 						    debug_object_fixup()
> 
> Why do you have object debugging enabled?

We have enabled object debugging to catch more issues around kernel.

>  That's going to take a LONG
> time, and will find bugs in your code.  Perhaps like this one? 
> 
> What type of device is this?  What bus?  What driver?

remoteproc client device driver would call dev_coredumpm() and devcd 
device gets added as part of the call.

> 
> And if you turn object debugging off, what happens?

We have not observed issue after disabling object debugging off.

Regards,
Mukesh

> 
> thanks,
> 
> greg k-h