[<prev] [next>] [day] [month] [year] [list]
Message-ID: <87y4adnkwm.fsf@x220.int.ebiederm.org>
Date: Mon, 22 Feb 2016 03:39:53 -0600
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Xianpeng Zhao <xpzhao@...ohive.com>
Cc: "linux-kernel\@vger.kernel.org" <linux-kernel@...r.kernel.org>,
<netdev@...r.kernel.org>
Subject: Re: [linux-kernel] dead loop for rtnl_trylock
Copied netdev as that is the more appropriate mailling list for
questions like this.
Xianpeng Zhao <xpzhao@...ohive.com> writes:
> Hi Group,
>
> I have find a problem in my system, I found there have a chance that cause the system enter dead loop when try to get the rtnl lock in the sysctl function in net/ipv6/addrconf.c
>
> The situation should like this, there are 2 processes may need get the rtnl lock, we call them process A and process B, A have high priority than B.
> B need get the rtnl lock to do something, when B schedule out without release the lock; At this time, the A start to run "echo 1 > /proc/sys/net/ipv6/conf/<ifname>/disable_ipv6", the echo process will run to this code:
>
> if (!rtnl_trylock())
>
> return restart_syscall();
>
> Because the rtnl lock was hold by process B, so here the try will be failure, and run the restart_syscall to let the sys_write do again, even try many times, because the B have very lower priority, the lock was hard to be released, so the echo process created by A will enter a loop of restart system call.
>
> In my case it is the wireless_nlevent_process in process kworker taken the rtnl lock, and another higher priority process need use echo to disable IPv6 met this problem.
>
> I am not very sure, but I think it is better to let the process A sleep a while instead of try it again and again without any delay.
>
> Expects, what's your opinions?
That the entire situation is a mess. From what little I have seen it is
a very rare condition. Does this reproduce easily in your environment?
If we are going the delay route we probably want to put the delay in
restart_syscall or in a wrapper around restart_syscall that we use
for the rtnl_trylock failure case.
On first blush I would suggest the logic for sleeping should be:
if (need_reschedule())
schedule();
That will limit the spinning to a single time slice which is definitely
preferrable.
Ugh. But we already cross the kernel/userspace boundary that already
does that.
If you are encountering a deadlock it is very much because you have been
playing very ugly priority games. At which point my sympathies but this
feels like a case of "Docter it hurts when I do this. Then don't do that."
> @@ -5304,8 +5308,10 @@ static int addrconf_disable_ipv6(struct ctl_table *table, int *p, int newf)
>
> struct net *net;
>
> int old;
>
>
>
> - if (!rtnl_trylock())
>
> + if (!rtnl_trylock()){
>
> + schedule_timeout_uninterruptible(HZ/4);
>
> return restart_syscall();
>
> + }
>
>
>
> net = (struct net *)table->extra2;
>
> old = *p;
Eric
Powered by blists - more mailing lists