netdev - Re: [PATCH net] Fix an intermittent pr_emerg warning about lo becoming free.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Fri, 09 Jun 2017 12:28:33 -0400 (EDT)
From:   David Miller <davem@...emloft.net>
To:     kjlx@...pleofstupid.com
Cc:     eric.dumazet@...il.com, netdev@...r.kernel.org
Subject: Re: [PATCH net] Fix an intermittent pr_emerg warning about lo
 becoming free.

From: Krister Johansen <kjlx@...pleofstupid.com>
Date: Thu, 8 Jun 2017 13:12:38 -0700

> It looks like this:
> 
> Message from syslogd@...mingo at Apr 26 00:45:00 ...
>  kernel:unregister_netdevice: waiting for lo to become free. Usage count = 4
> 
> They seem to coincide with net namespace teardown.
> 
> The message is emitted by netdev_wait_allrefs().
> 
> Forced a kdump in netdev_run_todo, but found that the refcount on the lo
> device was already 0 at the time we got to the panic.
> 
> Used bcc to check the blocking in netdev_run_todo.  The only places
> where we're off cpu there are in the rcu_barrier() and msleep() calls.
> That behavior is expected.  The msleep time coincides with the amount of
> time we spend waiting for the refcount to reach zero; the rcu_barrier()
> wait times are not excessive.
> 
> After looking through the list of callbacks that the netdevice notifiers
> invoke in this path, it appears that the dst_dev_event is the most
> interesting.  The dst_ifdown path places a hold on the loopback_dev as
> part of releasing the dev associated with the original dst cache entry.
> Most of our notifier callbacks are straight-forward, but this one a)
> looks complex, and b) places a hold on the network interface in
> question.
> 
> I constructed a new bcc script that watches various events in the
> liftime of a dst cache entry.  Note that dst_ifdown will take a hold on
> the loopback device until the invalidated dst entry gets freed.
 ...
> The way this works is that if there's still a reference on the dst entry
> at the time we try to free it, it gets placed in the gc list by
> __dst_free and the dst_destroy() call is invoked by the gc task once the
> refcount is 0.  If the gc task processes a 10th or less of its entries
> on a single pass, it inreases the amount of time it waits between gc
> intervals.
> 
> Looking at the gc_task intervals, they started at 663ms when we invoked
> __dst_free().  After that, they increased to 1663, 3136, 5567, 8191,
> 10751, and 14848.  The release that set the refcnt to 0 on our dst entry
> occurred after the gc_task was enqueued for 14 second interval so we had
> to wait longer than the warning time in wait_allrefs in order for the
> dst entry to get free'd and the hold on 'lo' to be released.
> 
> A simple solution to this problem is to have dst_dev_event() reset the
> gc timer, which causes us to process this list shortly after the
> gc_mutex is relased when dst_dev_event() completes.
> 
> Signed-off-by: Krister Johansen <kjlx@...pleofstupid.com>

Yeah this is one of the more unsatisfying areas of dst and device handling
in the tree, thanks for working on this.

Applied and queued up for -stable, thanks again.