[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49844fde-9424-4c81-85a0-c5c26a77321d@csgroup.eu>
Date: Tue, 17 Dec 2024 12:56:37 +0100
From: Christophe Leroy <christophe.leroy@...roup.eu>
To: Eric Dumazet <edumazet@...gle.com>
Cc: "David S. Miller" <davem@...emloft.net>, Jakub Kicinski
<kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Simon Horman <horms@...nel.org>, "Eric W. Biederman"
<ebiederm@...ssion.com>, linux-kernel@...r.kernel.org,
netdev@...r.kernel.org, Maxime Chevallier <maxime.chevallier@...tlin.com>,
TRINH THAI Florent <florent.trinh-thai@...soprasteria.com>,
CASAUBON Jean Michel <jean-michel.casaubon@...soprasteria.com>
Subject: Re: [PATCH net] net: sysfs: Fix deadlock situation in sysfs accesses
Le 17/12/2024 à 10:52, Eric Dumazet a écrit :
> On Tue, Dec 17, 2024 at 10:41 AM Christophe Leroy
> <christophe.leroy@...roup.eu> wrote:
>>
>>
>>
>> Le 17/12/2024 à 10:20, Eric Dumazet a écrit :
>>> On Tue, Dec 17, 2024 at 9:59 AM Christophe Leroy
>>> <christophe.leroy@...roup.eu> wrote:
>>>>
>>>>
>>>>
>>>> Le 17/12/2024 à 09:16, Eric Dumazet a écrit :
>>>>> On Tue, Dec 17, 2024 at 8:18 AM Christophe Leroy
>>>>> <christophe.leroy@...roup.eu> wrote:
>>>>>>
>>>>>> The following problem is encountered on kernel built with
>>>>>> CONFIG_PREEMPT. An snmp daemon running with normal priority is
>>>>>> regularly calling ioctl(SIOCGMIIPHY). Another process running with
>>>>>> SCHED_FIFO policy is regularly reading /sys/class/net/eth0/carrier.
>>>>>>
>>>>>> After some random time, the snmp daemon gets preempted while holding
>>>>>> the RTNL mutex then the high priority process is busy looping into
>>>>>> carrier_show which bails out early due to a non-successfull
>>>>>> rtnl_trylock() which implies restart_syscall(). Because the snmp
>>>>>> daemon has a lower priority, it never gets the chances to release
>>>>>> the RTNL mutex and the high-priority task continues to loop forever.
>>>>>>
>>>>>> Replace the trylock by lock_interruptible. This will increase the
>>>>>> priority of the task holding the lock so that it can release it and
>>>>>> allow the reader of /sys/class/net/eth0/carrier to actually perform
>>>>>> its read.
>>>>>>
>>>>
>>>> ...
>>>>
>>>>>>
>>>>>> Fixes: 336ca57c3b4e ("net-sysfs: Use rtnl_trylock in sysfs methods.")
>>>>>> Signed-off-by: Christophe Leroy <christophe.leroy@...roup.eu>
>>>>>> ---
>>>>>
>>>>> At a first glance, this might resurface the deadlock issue Eric W. Biederman
>>>>> was trying to fix in 336ca57c3b4e ("net-sysfs: Use rtnl_trylock in
>>>>> sysfs methods.")
>>>>
>>>> Are you talking about the deadlock fixed (incompletely) by 5a5990d3090b
>>>> ("net: Avoid race between network down and sysfs"), or the complement
>>>> provided by 336ca57c3b4e ?
>>>>
>>>> My understanding is that mutex_lock() will return EINTR only if a signal
>>>> is pending so there is no need to set signal_pending like it was when
>>>> using mutex_trylock() which does nothing when the mutex is already locked.
>>>>
>>>> And an EINTR return is expected and documented for a read() or a
>>>> write(), I can't see why we want ERESTARTNOINTR instead of ERSTARTSYS.
>>>> Isn't it the responsibility of the user app to call again read or write
>>>> if it has decided to not install the necessary sigaction for an
>>>> automatic restart ?
>>>>
>>>> Do you think I should instead use rtnl_lock_killable() and return
>>>> ERESTARTNOINTR in case of failure ? In that case, is it still possible
>>>> to interrupt a blocked 'cat /sys/class/net/eth0/carrier' which CTRL+C ?
>>>
>>> Issue is when no signal is pending, we have a typical deadlock situation :
>>>
>>> One process A is :
>>>
>>> Holding sysfs lock, then attempts to grab rtnl.
>>>
>>> Another one (B) is :
>>>
>>> Holding rtnl, then attempts to grab sysfs lock.
>>
>> Ok, I see.
>>
>> But then what can be the solution to avoid busy looping with
>> mutex_trylock , not giving any chance to the task holding the rtnl to
>> run and unlock it ?
>
> One idea would be to add a usleep(500, 1000) if the sysfs read/write handler in
> returns -ERESTARTNOINTR;
>
> Totally untested idea :
>
> diff --git a/fs/seq_file.c b/fs/seq_file.c
> index 8bbb1ad46335c3b8f50dd35d552f86767e62ead1..276c6d594129a18a7a4c2b1df447b34993398ab4
> 100644
> --- a/fs/seq_file.c
> +++ b/fs/seq_file.c
> @@ -290,6 +290,8 @@ ssize_t seq_read_iter(struct kiocb *iocb, struct
> iov_iter *iter)
> m->read_pos += copied;
> }
> mutex_unlock(&m->lock);
> + if (copied == -ERESTARTNOINTR)
> + usleep_range(500, 1000);
> return copied;
> Enomem:
> err = -ENOMEM;
Ok, that may solve the issue, but it looks more like a hack than a real
solution, doesn't it ?
It doesn't guarantee that the task holding the RTNL lock will be given
the floor to run and free the lock.
The real issue is the nest between sysfs lock and RTNL lock. Can't we
ensure that they are always held in the same order ?
Christophe
Powered by blists - more mailing lists