[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180727085351.36210a12@xeon-e3>
Date: Fri, 27 Jul 2018 08:53:51 -0700
From: Stephen Hemminger <stephen@...workplumber.org>
To: André Pribil <a.pribil@...k-ipc.com>
Cc: "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: Deadlock with restart_syscall()
On Mon, 16 Jul 2018 09:31:06 +0200
André Pribil <a.pribil@...k-ipc.com> wrote:
> Hello,
>
> I'm using kernel 4.14.52-rt34 on a single core ARM system and I'm seeing a
> deadlock inside the kernel when two RT processes make calls in the right
> temporal distance. The first process is trying to bring the Ethernet interface
> up, with the SIOCGIFFLAGS ioctl(). The second process is checking the Ethernet
> carrier, speed and duplex status, by reading e.g. "/sys/class/net/eth1/speed".
>
> The first process finally gets to phy_poll_reset() in
> drivers/net/phy/phy_device.c, where it calls msleep(50).
> It never returns from the sleep.
>
> The second process gets to speed_show() in net/core/net-sysfs.c. It tries to get
> the RTNL lock with rtnl_trylock(), but fails and calls restart_syscall().
> This happens over and over again.
>
> It seems like the first process in no longer scheduled and cannot release the
> RTNL lock, while the second process is busy restarting the syscall. The first
> process has a higher RT priority than the second process.
>
> Just for testing I've added the TIF_NEED_RESCHED flag to the restart_syscall()
> function and I did not see the deadlock again with this change.
>
> static inline int restart_syscall(void)
> {
> set_tsk_thread_flag(current, TIF_SIGPENDING | TIF_NEED_RESCHED);
> return -ERESTARTNOINTR;
> }
>
> As a second test I released the RTNL lock while calling msleep() in
> phy_poll_reset(). This also made the problem disappear.
>
> I've found this thread, where a similar issue with restart_syscall() has been
> reported:
> https://www.spinics.net/lists/netdev/msg415144.html
>
> Any ideas how to fix this issue?
>
> Andre
Don't do control operations from RT processes!
There can be cases of priority inversion where RT process is waiting for
something that requires a kthread to complete the operation.
Powered by blists - more mailing lists