[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2a6edf25-b12b-c500-ad33-c0ec9e60cde9@cumulusnetworks.com>
Date: Thu, 6 Aug 2020 12:46:43 +0300
From: Nikolay Aleksandrov <nikolay@...ulusnetworks.com>
To: Rasmus Villemoes <rasmus.villemoes@...vas.dk>,
Stephen Hemminger <stephen@...workplumber.org>
Cc: Network Development <netdev@...r.kernel.org>
Subject: Re: rtnl_trylock() versus SCHED_FIFO lockup
On 06/08/2020 12:17, Rasmus Villemoes wrote:
> On 06/08/2020 01.34, Stephen Hemminger wrote:
>> On Wed, 5 Aug 2020 16:25:23 +0200
>> Rasmus Villemoes <rasmus.villemoes@...vas.dk> wrote:
>>
>>> Hi,
>>>
>>> We're seeing occasional lockups on an embedded board (running an -rt
>>> kernel), which I believe I've tracked down to the
>>>
>>> if (!rtnl_trylock())
>>> return restart_syscall();
>>>
>>> in net/bridge/br_sysfs_br.c. The problem is that some SCHED_FIFO task
>>> writes a "1" to the /sys/class/net/foo/bridge/flush file, while some
>>> lower-priority SCHED_FIFO task happens to hold rtnl_lock(). When that
>>> happens, the higher-priority task is stuck in an eternal ERESTARTNOINTR
>>> loop, and the lower-priority task never gets runtime and thus cannot
>>> release the lock.
>>>
>>> I've written a script that rather quickly reproduces this both on our
>>> target and my desktop machine (pinning everything on one CPU to emulate
>>> the uni-processor board), see below. Also, with this hacky patch
>>
>> There is a reason for the trylock, it works around a priority inversion.
>
> Can you elaborate? It seems to me that it _causes_ a priority inversion
> since priority inheritance doesn't have a chance to kick in.
>
>> The real problem is expecting a SCHED_FIFO task to be safe with this
>> kind of network operation.
>
> Maybe. But ignoring the SCHED_FIFO/rt-prio stuff, it also seems a bit
> odd to do what is essentially a busy-loop - yes, the restart_syscall()
> allows signals to be delivered (including allowing the process to get
> killed), but in the absence of any signals, the pattern essentially
> boils down to
>
> while (!rtnl_trylock())
> ;
>
> So even for regular tasks, this seems to needlessly hog the cpu.
>
> I tried this
>
> diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
> index 0318a69888d4..e40e264f9b16 100644
> --- a/net/bridge/br_sysfs_br.c
> +++ b/net/bridge/br_sysfs_br.c
> @@ -44,8 +44,8 @@ static ssize_t store_bridge_parm(struct device *d,
> if (endp == buf)
> return -EINVAL;
>
> - if (!rtnl_trylock())
> - return restart_syscall();
> + if (rtnl_lock_interruptible())
> + return -ERESTARTNOINTR;
>
> err = (*set)(br, val);
> if (!err)
>
> with the obvious definition of rtnl_lock_interruptible(), and it makes
> the problem go away. Isn't it better to sleep waiting for the lock (and
> with -rt, giving proper priority boost) or a signal to arrive rather
> than busy-looping back and forth between syscall entry point and the
> trylock()?
>
> I see quite a lot of
>
> if (mutex_lock_interruptible(...))
> return -ERESTARTSYS;
>
> but for the rtnl_mutex, I see the trylock...restart_syscall pattern
> being used in a couple of places. So there must be something special
> about the rtnl_mutex?
>
> Thanks,
> Rasmus
>
Hi Rasmus,
I haven't tested anything but git history (and some grepping) points to deadlocks when
sysfs entries are being changed under rtnl.
For example check: af38f2989572704a846a5577b5ab3b1e2885cbfb and 336ca57c3b4e2b58ea3273e6d978ab3dfa387b4c
This is a common usage pattern throughout net/, the bridge is not the only case and there are more
commits which talk about deadlocks.
Again I haven't verified anything but it seems on device delete (w/ rtnl held) -> sysfs delete
would wait for current readers, but current readers might be stuck waiting on rtnl and we can deadlock.
Cheers,
Nik
Powered by blists - more mailing lists