netdev - Re: rtnl_trylock() versus SCHED

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <29a82363-411c-6f2b-9f55-97482504e453@prevas.dk>
Date:   Fri, 7 Aug 2020 10:03:59 +0200
From:   Rasmus Villemoes <rasmus.villemoes@...vas.dk>
To:     Stephen Hemminger <stephen@...workplumber.org>,
        Nikolay Aleksandrov <nikolay@...ulusnetworks.com>
Cc:     Network Development <netdev@...r.kernel.org>
Subject: Re: rtnl_trylock() versus SCHED_FIFO lockup

On 07/08/2020 05.39, Stephen Hemminger wrote:
> On Thu, 6 Aug 2020 12:46:43 +0300
> Nikolay Aleksandrov <nikolay@...ulusnetworks.com> wrote:
> 
>> On 06/08/2020 12:17, Rasmus Villemoes wrote:
>>> On 06/08/2020 01.34, Stephen Hemminger wrote:  
>>>> On Wed, 5 Aug 2020 16:25:23 +0200

>>
>> Hi Rasmus,
>> I haven't tested anything but git history (and some grepping) points to deadlocks when
>> sysfs entries are being changed under rtnl.
>> For example check: af38f2989572704a846a5577b5ab3b1e2885cbfb and 336ca57c3b4e2b58ea3273e6d978ab3dfa387b4c
>> This is a common usage pattern throughout net/, the bridge is not the only case and there are more
>> commits which talk about deadlocks.
>> Again I haven't verified anything but it seems on device delete (w/ rtnl held) -> sysfs delete
>> would wait for current readers, but current readers might be stuck waiting on rtnl and we can deadlock.
>>
> 
> I was referring to AB BA lock inversion problems.

Ah, so lock inversion, not priority inversion.

> 
> Yes the trylock goes back to:
> 
> commit af38f2989572704a846a5577b5ab3b1e2885cbfb
> Author: Eric W. Biederman <ebiederm@...ssion.com>
> Date:   Wed May 13 17:00:41 2009 +0000
> 
>     net: Fix bridgeing sysfs handling of rtnl_lock
>     
>     Holding rtnl_lock when we are unregistering the sysfs files can
>     deadlock if we unconditionally take rtnl_lock in a sysfs file.  So fix
>     it with the now familiar patter of: rtnl_trylock and syscall_restart()
>     
>     Signed-off-by: Eric W. Biederman <ebiederm@...stanetworks.com>
>     Signed-off-by: David S. Miller <davem@...emloft.net>
> 
> 
> The problem is that the unregister of netdevice happens under rtnl and
> this unregister path has to remove sysfs and other objects.
> So those objects have to have conditional locking.
I see. And the reason the "trylock, unwind all the way back to syscall
entry and start over" works is that we then go through

kernfs_fop_write()
	mutex_lock(&of->mutex);
	if (!kernfs_get_active(of->kn)) {
		mutex_unlock(&of->mutex);
		len = -ENODEV;
		goto out_free;
	}

which makes the write fail with ENODEV if the sysfs node has already
been marked for removal.

If I'm reading the code correctly, doing "ip link set dev foobar type
bridge fdb_flush" is equivalent to writing to that sysfs file, except
the former ends up doing an unconditional rtnl_lock() and thus won't
have the livelocking issue.

Thanks,
Rasmus