[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b7ff3781-a944-ae04-91d1-14a7cb8187b2@cumulusnetworks.com>
Date: Sun, 9 Aug 2020 17:18:20 +0300
From: Nikolay Aleksandrov <nikolay@...ulusnetworks.com>
To: Hillf Danton <hdanton@...a.com>,
Stephen Hemminger <stephen@...workplumber.org>
Cc: Rasmus Villemoes <rasmus.villemoes@...vas.dk>,
Network Development <netdev@...r.kernel.org>,
Markus Elfring <Markus.Elfring@....de>
Subject: Re: rtnl_trylock() versus SCHED_FIFO lockup
On 09/08/2020 17:12, Nikolay Aleksandrov wrote:
> On 09/08/2020 16:49, Hillf Danton wrote:
>>
>> On Fri, 7 Aug 2020 08:03:32 -0700 Stephen Hemminger wrote:
>>> On Fri, 7 Aug 2020 10:03:59 +0200
>>> Rasmus Villemoes <rasmus.villemoes@...vas.dk> wrote:
>>>
>>>> On 07/08/2020 05.39, Stephen Hemminger wrote:
>>>>> On Thu, 6 Aug 2020 12:46:43 +0300
>>>>> Nikolay Aleksandrov <nikolay@...ulusnetworks.com> wrote:
>>>>>
>>>>>> On 06/08/2020 12:17, Rasmus Villemoes wrote:
>>>>>>> On 06/08/2020 01.34, Stephen Hemminger wrote:
>>>>>>>> On Wed, 5 Aug 2020 16:25:23 +0200
>>>>
>>>>>>
>>>>>> Hi Rasmus,
>>>>>> I haven't tested anything but git history (and some grepping) points to deadlocks when
>>>>>> sysfs entries are being changed under rtnl.
>>>>>> For example check: af38f2989572704a846a5577b5ab3b1e2885cbfb and 336ca57c3b4e2b58ea3273e6d978ab3dfa387b4c
>>>>>> This is a common usage pattern throughout net/, the bridge is not the only case and there are more
>>>>>> commits which talk about deadlocks.
>>>>>> Again I haven't verified anything but it seems on device delete (w/ rtnl held) -> sysfs delete
>>>>>> would wait for current readers, but current readers might be stuck waiting on rtnl and we can deadlock.
>>>>>>
>>>>>
>>>>> I was referring to AB BA lock inversion problems.
>>>>
>>>> Ah, so lock inversion, not priority inversion.
>>
>> Hi folks,
>>
>> Is it likely that kworker helps work around that deadlock, by
>> acquiring the rtnl lock in the case that the current fails to
>> trylock it?
>>
>> Hillf
>
> You know it's a user writing to a file expecting config change, right?
> There are numerous problems with deferring it (e.g. error handling).
>
> Thanks,
> Nik
OK, admittedly spoke too soon about the error handling. :)
But I still think it suffers the same problem if the sysfs files are going to be destroyed
under rtnl while you're writing in one. Their users are "drained", so it will again wait forever.
Because neither rtnl will be released, nor the writer will finish.
And it may become even more interesting if we're trying to remove the bridge module at that time.
Powered by blists - more mailing lists