lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <b7ad0acb-1cad-3a45-0b0f-57d576ff1a36@osg.samsung.com>
Date:   Tue, 31 Jan 2017 14:49:52 -0300
From:   Javier Martinez Canillas <javier@....samsung.com>
To:     netdev <netdev@...r.kernel.org>
Cc:     "David S. Miller" <davem@...emloft.net>,
        Sjoerd Simons <sjoerd.simons@...labora.co.uk>,
        Kevin Hilman <khilman@...libre.com>,
        Shuah Khan <shuahkh@....samsung.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: ip_auto_config() prevents network device to be registered

Hello,

The kernelci folks pointed out that a Samsung Exynos based board was failing
to boot when trying to mount the rootfs via NFS, due a networking issue [0].

I looked at the issue and it turned out to be a race between ip_auto_config()
and register_netdev() when using the ip=dhcp param in the kernel command line.

The problem is that ip_auto_config() calls wait_for_devices() [1] and returns
as soon as it finds a network device registered. Then ic_open_devs() [2] is
called then to bring the network devs up and wait for their carrier signals.

But ic_open_devs() grabs the rtnl_mutex lock [3] when doing this, which is the
same lock that register_netdev() [4] grabs before registering a network device.

And so if a network dev is found and wait_for_devices() returns, ic_open_devs()
will be called and no new network dev could be registered in the meantime.

So since ic_open_devs() waits up to CONF_CARRIER_TIMEOUT (120 secs) with this
lock held, if the network dev that's supposed to get its IP over DHCP isn't the
first to be registered, the boot test job may timeout and be considered a fail.

A workaround is to use ip=:::::eth0:dhcp instead ip=dhcp, so wait_for_devices()
waits for this specific device. Another workaround is to increase the timeout
for the job to be much bigger than CONF_CARRIER_TIMEOUT so ip_auto_config() can
retry and the network devices can be registered between tries.

But I wonder if someone can suggest a proper way to fix this. Grabbing a mutex
that prevents network devs to be registered for 120 secs doesn't sound correct.

Thanks a lot for your help and please let me know if I misunderstood something.

[0]: https://storage.kernelci.org/mainline/v4.9/arm-exynos_defconfig/lab-collabora/boot-exynos5422-odroidxu3_rootfs:nfs.html
[1]: http://lxr.free-electrons.com/source/net/ipv4/ipconfig.c#L1368
[2]: http://lxr.free-electrons.com/source/net/ipv4/ipconfig.c#L202
[3]: http://lxr.free-electrons.com/source/net/core/rtnetlink.c#L68
[4]: http://lxr.free-electrons.com/source/net/core/dev.c#L7326

Best regards,
-- 
Javier Martinez Canillas
Open Source Group
Samsung Research America

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ