[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49a43774-bf97-4b20-8382-4fb921f34c66@csgroup.eu>
Date: Tue, 17 Dec 2024 09:59:16 +0100
From: Christophe Leroy <christophe.leroy@...roup.eu>
To: Eric Dumazet <edumazet@...gle.com>
Cc: "David S. Miller" <davem@...emloft.net>, Jakub Kicinski
<kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Simon Horman <horms@...nel.org>, "Eric W. Biederman"
<ebiederm@...ssion.com>, linux-kernel@...r.kernel.org,
netdev@...r.kernel.org, Maxime Chevallier <maxime.chevallier@...tlin.com>,
TRINH THAI Florent <florent.trinh-thai@...soprasteria.com>,
CASAUBON Jean Michel <jean-michel.casaubon@...soprasteria.com>
Subject: Re: [PATCH net] net: sysfs: Fix deadlock situation in sysfs accesses
Le 17/12/2024 à 09:16, Eric Dumazet a écrit :
> On Tue, Dec 17, 2024 at 8:18 AM Christophe Leroy
> <christophe.leroy@...roup.eu> wrote:
>>
>> The following problem is encountered on kernel built with
>> CONFIG_PREEMPT. An snmp daemon running with normal priority is
>> regularly calling ioctl(SIOCGMIIPHY). Another process running with
>> SCHED_FIFO policy is regularly reading /sys/class/net/eth0/carrier.
>>
>> After some random time, the snmp daemon gets preempted while holding
>> the RTNL mutex then the high priority process is busy looping into
>> carrier_show which bails out early due to a non-successfull
>> rtnl_trylock() which implies restart_syscall(). Because the snmp
>> daemon has a lower priority, it never gets the chances to release
>> the RTNL mutex and the high-priority task continues to loop forever.
>>
>> Replace the trylock by lock_interruptible. This will increase the
>> priority of the task holding the lock so that it can release it and
>> allow the reader of /sys/class/net/eth0/carrier to actually perform
>> its read.
>>
...
>>
>> Fixes: 336ca57c3b4e ("net-sysfs: Use rtnl_trylock in sysfs methods.")
>> Signed-off-by: Christophe Leroy <christophe.leroy@...roup.eu>
>> ---
>
> At a first glance, this might resurface the deadlock issue Eric W. Biederman
> was trying to fix in 336ca57c3b4e ("net-sysfs: Use rtnl_trylock in
> sysfs methods.")
Are you talking about the deadlock fixed (incompletely) by 5a5990d3090b
("net: Avoid race between network down and sysfs"), or the complement
provided by 336ca57c3b4e ?
My understanding is that mutex_lock() will return EINTR only if a signal
is pending so there is no need to set signal_pending like it was when
using mutex_trylock() which does nothing when the mutex is already locked.
And an EINTR return is expected and documented for a read() or a
write(), I can't see why we want ERESTARTNOINTR instead of ERSTARTSYS.
Isn't it the responsibility of the user app to call again read or write
if it has decided to not install the necessary sigaction for an
automatic restart ?
Do you think I should instead use rtnl_lock_killable() and return
ERESTARTNOINTR in case of failure ? In that case, is it still possible
to interrupt a blocked 'cat /sys/class/net/eth0/carrier' which CTRL+C ?
>
> I was hoping that at some point, some sysfs write methods could be
> marked as : "We do not need to hold the sysfs lock"
Powered by blists - more mailing lists