lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 30 Apr 2021 14:59:30 +0300
From:   Nikolai Zhubr <zhubr.2@...il.com>
To:     Chris Snook <chris.snook@...il.com>, netdev@...r.kernel.org,
        Johannes Berg <johannes@...solutions.net>,
        nic-devel@...lcomm.com
Subject: A problem with "ip=..." ipconfig and Atheros alx driver.

Hello Chris and others,

I'm observing a problem with Atheros alx ethernet driver and in-kernel 
ip4 configuration (using "ip=192.168....." boot parameter).

The problem first showed itself as a huge unexpected delay in bootup as 
long as "ip=..." was specified (and a real device is present). I've then 
noticed a timeout counter "Waiting up to 110 more seconds for network" 
between the "Atheros(R) AR816x/AR817x" message and "eth0: NIC Up: 1 Gbps 
Full" message. Meanwhile, this ethernet device is fully operational and 
my cable is perfectly reliable.

Now, after debugging it a little bit more, I've apparently found the 
root cause. One can see in net/ipv4/ipconfig.c that ic_open_devs() tries 
to ensure carrier is physically present. But before opening device(s) 
and starting wait for the carrier, it calls rtnl_lock(). Now in 
ethernet/atheros/alx/main.c one can see that at opening, it first calls 
netif_carrier_off() then schedules alx_link_check() to do actual work, 
so carrier detection is supposed to happen a bit later. Now looking at 
this alx_link_check() carefully, first thing is does is rtnl_lock(). 
Bingo! Double-lock. Effectively actual carrier check in alx is therefore 
delayed just until ic_open_devs() gave up waiting for it and called 
rtnl_unlock(). Hence this delay and timeout.

I have checked with clean 4.9.268 and 5.4.115 on real hardware.
Can't check with 5.12 at the moment because my gcc is somewhat old to 
compile it, but browsing the code it looks like nothing has changed 
substantially anyway.

Fixing this myself is a bit beyond my capability I'm afraid, but I'd be 
happy do some testing if someone requests me to.


Thank you,

Reagrds,
Nikolai

Powered by blists - more mailing lists