netdev - Re: [linux-kernel] dead loop for rtnl

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <87y4adnkwm.fsf@x220.int.ebiederm.org>
Date:	Mon, 22 Feb 2016 03:39:53 -0600
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Xianpeng Zhao <xpzhao@...ohive.com>
Cc:	"linux-kernel\@vger.kernel.org" <linux-kernel@...r.kernel.org>,
	<netdev@...r.kernel.org>
Subject: Re: [linux-kernel] dead loop for rtnl_trylock


Copied netdev as that is the more appropriate mailling list for
questions like this.

Xianpeng Zhao <xpzhao@...ohive.com> writes:

> Hi Group,
>
>          I have find a problem in my system, I found there have a chance that cause the system enter dead loop when try to get the rtnl lock in the sysctl function in net/ipv6/addrconf.c
>
>          The situation should like this, there are 2 processes may need get the rtnl lock, we call them process A and process B, A have high priority than B.
> B need get the rtnl lock to do something, when B schedule out without release the lock; At this time, the A start to run "echo 1 > /proc/sys/net/ipv6/conf/<ifname>/disable_ipv6", the echo process will run to this code:
>
>     if (!rtnl_trylock())
>
>         return restart_syscall();
>
> Because the rtnl lock was hold by process B, so here the try will be failure, and run the restart_syscall to let the sys_write do again, even try many times, because the B have very lower priority, the lock was hard to be released, so the echo process created by A will enter a loop of restart system call.
>
> In my case it is the wireless_nlevent_process in process kworker taken the rtnl lock, and another higher priority process need use echo to disable IPv6 met this problem.
>
> I am not very sure, but I think it is better to let the process A sleep a while instead of try it again and again without any delay.
>
> Expects, what's your opinions?

That the entire situation is a mess.  From what little I have seen it is
a very rare condition.  Does this reproduce easily in your environment?

If we are going the delay route we probably want to put the delay in
restart_syscall or in a wrapper around restart_syscall that we use
for the rtnl_trylock failure case.

On first blush I would suggest the logic for sleeping should be:
	if (need_reschedule())
		schedule();

That will limit the spinning to a single time slice which is definitely
preferrable.

Ugh.  But we already cross the kernel/userspace boundary that already
does that.

If you are encountering a deadlock it is very much because you have been
playing very ugly priority games.  At which point my sympathies but this
feels like a case of "Docter it hurts when I do this.  Then don't do that."

> @@ -5304,8 +5308,10 @@ static int addrconf_disable_ipv6(struct ctl_table *table, int *p, int newf)
>
>         struct net *net;
>
>         int old;
>
>  
>
> -       if (!rtnl_trylock())
>
> +       if (!rtnl_trylock()){
>
> +        schedule_timeout_uninterruptible(HZ/4);
>
>                 return restart_syscall();
>
> +    }
>
>  
>
>         net = (struct net *)table->extra2;
>
>         old = *p;

Eric