[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <38602241-3d4e-7348-526c-80b44fb4cba7@quicinc.com>
Date:   Wed, 13 Apr 2022 19:48:12 +0530
From:   Mukesh Ojha <quic_mojha@...cinc.com>
To:     Greg KH <gregkh@...uxfoundation.org>,
        <linux-remoteproc@...r.kernel.org>
CC:     <linux-kernel@...r.kernel.org>, <tglx@...utronix.de>,
        <sboyd@...nel.org>, <johannes@...solutions.net>,
        <rafael@...nel.org>
Subject: Re: Possible race in dev_coredumpm()-del_timer() path
On 4/13/2022 4:28 PM, Greg KH wrote:
> On Wed, Apr 13, 2022 at 03:46:39PM +0530, Mukesh Ojha wrote:
>> On Wed, Apr 13, 2022 at 07:34:24AM +0200, Greg KH wrote:
>>> On Wed, Apr 13, 2022 at 10:59:22AM +0530, Mukesh Ojha wrote:
>>>> Hi All,
>>>>
>>>> We are hitting one race due to which try_to_grab_pending() is stuck .
>>>
>>> What kernel version are you using?
>>
>> 5.10
> 
> 5.10.0 was released a very long time ago.  Please use a more modern
> kernel release :)
> 
It would not be feasible for us to switch to latest kernel and I think, 
this issue could be there in recent kernel as well.
>> Sorry, for the formatting mess.
>>
>>>> In following scenario, while running (p1)dev_coredumpm() devcd device is
>>>> added to
>>>> the framework and uevent notification sent to userspace that result in the
>>>> call to (p2) devcd_data_write()
>>>> which eventually try to delete the queued timer which in the racy scenario
>>>> timer is not queued yet.
>>>> So, debug object report some warning and in the meantime timer is
>>>> initialized and queued from p1 path.
>>>> and from p2 path it gets overriden again timer->entry.pprev=NULL and
>>>> try_to_grab_pending() stuck
>> 	p1 					p2(X)
>>
>>     dev_coredump() uevent sent to userspace
>>     device_add()  =========================> userspace process X reads the uevents
>>                                              writes to devcd fd which
>>                                              results into writes to
>>
>>                                              devcd_data_write()
>> 					      mod_delayed_work()
>>                                                  try_to_grab_pending()
>> 						  del_timer()
>> 						   debug_assert_init()
>>    INIT_DELAYED_WORK
>>    schedule_delayed_work
>> 						    debug_object_fixup()
> 
> Why do you have object debugging enabled?
We have enabled object debugging to catch more issues around kernel.
>  That's going to take a LONG
> time, and will find bugs in your code.  Perhaps like this one? 
> 
> What type of device is this?  What bus?  What driver?
remoteproc client device driver would call dev_coredumpm() and devcd 
device gets added as part of the call.
> 
> And if you turn object debugging off, what happens?
We have not observed issue after disabling object debugging off.
Regards,
Mukesh
> 
> thanks,
> 
> greg k-h
Powered by blists - more mailing lists
 
