[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+pO-2cQn07HHAQWX+-vGDRa_5QMVP_iC_RNYvzXPFnq_xyTpw@mail.gmail.com>
Date: Mon, 31 Jul 2017 19:37:49 +0100
From: Rolf Neugebauer <rolf.neugebauer@...ker.com>
To: Cong Wang <xiyou.wangcong@...il.com>
Cc: Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: Long stalls creating a new netns after a netns with a SMB client exits
On Mon, Jul 31, 2017 at 6:06 PM, Cong Wang <xiyou.wangcong@...il.com> wrote:
> On Fri, Jul 28, 2017 at 11:58 AM, Rolf Neugebauer
> <rolf.neugebauer@...ker.com> wrote:
>> On Fri, Jul 28, 2017 at 6:49 PM, Cong Wang <xiyou.wangcong@...il.com> wrote:
>>> Hello,
>>>
>>> On Fri, Jul 28, 2017 at 9:47 AM, Rolf Neugebauer
>>> <rolf.neugebauer@...ker.com> wrote:
>>>> Creating the new namespace is stalling for around 200 seconds and
>>>> there 20 odd messages on the console, like:
>>>>
>>>> [ 67.372603] unregister_netdevice: waiting for lo to become free.
>>>> Usage count = 1
>>>>
>>>
>>> Sounds like another netdev refcnt leak.
>>
>> I don't think it's a leak as such because the system eventually
>> recovers after around 200 seconds.
>>
>>>
>>>> Adding a 'sleep 1' before deleting the original network namespace
>>>> "solves" the issue, but that doesn't sound like a good fix. Not using
>>>> unmount also does not help (understandable).
>>>
>>>
>>> Interesting, if sleeping for 1sec help, why did you see the stall for
>>> 200sec? The "leak" should go away eventually without 'sleep 1',
>>> right?
>>
>> Yes. I suspect, that with a sleep some cleanup code (maybe umount)
>> gets run and the ref count gets decremented within the second. Without
>> the sleep, something gets yanked, and whatever operation needs to be
>> done can't get performed, times out after 200s and then the ref count
>> gets decremented.
>
>
> This reminds me of
>
> commit f186ce61bb8235d80068c390dc2aad7ca427a4c2
> Author: Krister Johansen <kjlx@...pleofstupid.com>
> Date: Thu Jun 8 13:12:38 2017 -0700
>
> Fix an intermittent pr_emerg warning about lo becoming free.
>
> but this one is merged in 4.12 too, so must be something else.
I've done my last test on 4.9.40 not 4.12.x but another user reported
the same issue on 4.12 and 4.4.x and I just verified on 4.12.4 as
well.
I reported something with similar symptoms, i.e. stalls which
eventually stopped here:
http://marc.info/?l=linux-netdev&m=147870616302799&w=2 but judging by
the backtraces from that post it looks entirely different
Rolf
Powered by blists - more mailing lists