lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <681500CE65202E47A192754B01DAB4673BE3D87D8D@SDE12.beckipc.net>
Date:   Mon, 16 Jul 2018 09:31:06 +0200
From:   André Pribil <a.pribil@...k-ipc.com>
To:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Deadlock with restart_syscall()

Hello,

I'm using kernel 4.14.52-rt34 on a single core ARM system and I'm seeing a 
deadlock inside the kernel when two RT processes make calls in the right 
temporal distance. The first process is trying to bring the Ethernet interface 
up, with the SIOCGIFFLAGS ioctl(). The second process is checking the Ethernet 
carrier, speed and duplex status, by reading e.g. "/sys/class/net/eth1/speed".

The first process finally gets to phy_poll_reset() in 
drivers/net/phy/phy_device.c, where it calls msleep(50). 
It never returns from the sleep.

The second process gets to speed_show() in net/core/net-sysfs.c. It tries to get
the RTNL lock with rtnl_trylock(), but fails and calls restart_syscall(). 
This happens over and over again.

It seems like the first process in no longer scheduled and cannot release the
RTNL lock, while the second process is busy restarting the syscall. The first 
process has a higher RT priority than the second process.
                                                         
Just for testing I've added the TIF_NEED_RESCHED flag to the restart_syscall() 
function and I did not see the deadlock again with this change.

static inline int restart_syscall(void)
{
	set_tsk_thread_flag(current, TIF_SIGPENDING | TIF_NEED_RESCHED);
	return -ERESTARTNOINTR;
}

As a second test I released the RTNL lock while calling msleep() in 
phy_poll_reset(). This also made the problem disappear.

I've found this thread, where a similar issue with restart_syscall() has been 
reported:
https://www.spinics.net/lists/netdev/msg415144.html

Any ideas how to fix this issue?

Andre   

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ