lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Fri, 4 May 2018 09:54:27 +0200
From:   Rafał Miłecki <zajec5@...il.com>
To:     Konstantin Khlebnikov <khlebnikov@...dex-team.ru>,
        WANG Cong <xiyou.wangcong@...il.com>,
        "David S. Miller" <davem@...emloft.net>,
        Alexey Kuznetsov <kuznet@....inr.ac.ru>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        Network Development <netdev@...r.kernel.org>,
        jeffy <jeffy.chen@...k-chips.com>,
        David Ahern <dsahern@...il.com>
Cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Stable <stable@...r.kernel.org>,
        Dan Streetman <ddstreet@...e.org>,
        Dan Streetman <ddstreet@...onical.com>,
        Mathias Tillman <master.homer@...il.com>
Subject: Re: Repeating "unregister_netdevice: waiting for lo to become free"
 caused by upstream 76da0704507bb ("ipv6: only call ip6_route_dev_notify()
 once for NETDEV_UNREGISTER")

On 25 April 2018 at 16:44, Rafał Miłecki <zajec5@...il.com> wrote:
> On 25.04.2018 16:30, Konstantin Khlebnikov wrote:
>>
>> On 25.04.2018 17:16, Rafał Miłecki wrote:
>>>
>>> On 23.04.2018 15:08, Rafał Miłecki wrote:
>>>>
>>>> I've just updated my kernel 4.4.x and noticed a regression. Bisecting
>>>> pointed me to the commit 2417da3f4d6bc ("ipv6: only call
>>>> ip6_route_dev_notify() once for NETDEV_UNREGISTER") [0] which is
>>>> backport of upstream 76da0704507bb. That backported commit has
>>>> appeared in a 4.4.103.
>>>>
>>>> I use OpenWrt/LEDE [1] distribution and LXC [2] 1.1.5. After stopping
>>>> a container I start getting these messages:
>>>> [  229.419188] unregister_netdevice: waiting for lo to become free.
>>>> Usage count = 1
>>>> [  239.660408] unregister_netdevice: waiting for lo to become free.
>>>> Usage count = 1
>>>> [  249.839189] unregister_netdevice: waiting for lo to become free.
>>>> Usage count = 1
>>>> (...)
>>>>
>>>> Trying to start LXC nevertheless results in lxc-start command hang
>>>> around network configuration. Trying to query LXC state afterwards
>>>> results in a lxc-info command hang too.
>>>>
>>>> I tried Googling for this issue and found similar reports:
>>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1729637
>>>> https://github.com/fnproject/fn/issues/686
>>>>
>>>> https://lime-technology.com/forums/topic/66863-kernelunregister_netdevice-waiting-for-lo-to-become-free-usage-count-1/
>>>> all of them related to the Docker, which is probably a similar use
>>>> case to the LXC.
>>>>
>>>> I couldn't find any reference to commit 76da0704507bb that could
>>>> suggest fixing the problem I'm seeing.
>>>>
>>>> Does anyone have an idea what is the issue I'm seeing about? Or even
>>>> better, how to fix it? Can I provide any additional info that would
>>>> help?
>>>>
>>>>
>>>> [0]
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.4.y&id=2417da3f4d6bc4fc6c77f613f0e2264090892aa5
>>>> [1] https://openwrt.org/
>>>> [2] https://linuxcontainers.org/
>>>
>>>
>>> Today I tried 4.14.34 to see if that helps. Unfortunately it doesn't. I
>>> still experience the same problem.
>>>
>>>  From reading various reports regarding that "unregister_netdevice:
>>> waiting for lo to become free" message it appears the problem is caused
>>> by a leaking dst refcnt somewhere in the kernel code.
>>>
>>> I found links to few commit fixing leaks at various places:
>>> 4a31a6b19f9dd ("sctp: fix dst refcnt leak in sctp_v4_get_dst")
>>> 957d761cf91cd ("sctp: fix dst refcnt leak in sctp_v6_get_dst()")
>>> 4ee806d51176b ("net: tcp: close sock if net namespace is exiting")
>>> d747a7a51b009 ("tcp: reset sk_rx_dst in tcp_disconnect()")
>>> 751eb6b6042a5 ("ipv6: addrconf: fix dev refcont leak when DAD failed")
>>>
>>> All above patches are present in the linux-v4.4.y and are part of kernel
>>> 4.4.124 I use. So it seems I'm facing yet another dst refcnt leak.
>>>
>>> Could commit 2417da3f4d6bc ("ipv6: only call ip6_route_dev_notify() once
>>> for NETDEV_UNREGISTER") introduce a new dst refcnt leak? Or does it only
>>> expost existing one?
>>
>>
>> Mathias Tillman reported this as "4.4.103 linux kernel regression".
>> Last message in that thread (which I couldn't find in mailing list
>> archives) had:
>> | As it turns out, it's due to a patch in the Turris Omnia/OpenWRT code
>> that adds a in6_dev_get call without calling in6_dev_put.
>
>
> Wow, this is very helpful, thank you!
>
> Somehow I didn't even think about OpenWrt downstream patches. Too bad
> this wasn't reported to the OpenWrt community, I spent 2 days on this.
> There is indeed:
> target/linux/generic/patches-4.4/670-ipv6-allow-rejecting-with-source-address-failed-policy.patch
> [PATCH 1/2] ipv6: allow rejecting with "source address failed policy"
>
> I'll move this issue discussion to the OpenWrt/LEDE now, I hope we can
> sort it out.

For a reference it has been fixed in OpenWrt/LEDE by Felix in:

1) master branch:
https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=58f7b5b96c301176d639540df4723c798af2a999

2) lede-17.01 branch
https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=999bb66b20b03c753801ecebf1ec2a03c6a63c96

-- 
Rafał

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ