lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Fri, 10 Feb 2017 05:45:36 +0000
From:   Kaiwen Xu <kevin@...xu.net>
To:     Cong Wang <xiyou.wangcong@...il.com>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: loopback device reference count leakage

I am using macvlan device inside the container. With following Docker
network plugin:

https://github.com/gopher-net/macvlan-docker-plugin

Each macvlan device, which gets assigned into the container network
namespace, is attached to host's vlan device, which is then attached to
host's eth0.

    eth0  <==  eth0.1000  <==  macvlan0 (host macvlan device)
                          \==  macvlan1 (container macvlan device)
                          \==  macvlan2 (container macvlan device)
                          ...

eth0 has a 10.x.x.x/24 IP address. eth0.1000 is able to use any of the
addresses in another 10.x.x.y/24 range (different from the /24 assigned to
eth0), but itself isn't directly assigned an IP address. macvlan0, which
is on the host, is assigned an IP address in the 10.x.x.y/24 range that
belongs to eth0.1000. When container start up, a new macvlan device is
created attaching to eth0.1000 with a different 10.x.x.y/24 address,
which is assigned into the container network namespace. The container's
10.x.x.y/24 address is directly reachable outside of the host.

Thanks,
Kaiwen

On Wed, Feb 08, 2017 at 01:50:57PM -0800, Cong Wang wrote:
> On Mon, Feb 6, 2017 at 6:32 PM, Kaiwen Xu <kevin@...xu.net> wrote:
> > Hi Cong,
> >
> > I did some more testing, seems like your second assumption is correct.
> > There is indeed some things holding the references to a particular dst
> > which preventing it to be gc'ed.
> 
> Excellent!
> 
> >
> > I added logging to each dst_hold (or dst_hold_safe, or
> > skb_dst_force_safe) and dst_release, which formatted as following:
> >
> > <dev name> (<protocol>) [<dst addr>]: dst_release / dst_hold ... <refcnt> <caller function>
> >
> > And inside dst_gc_task(), I added logging when gc delay occurred,
> > formatted as:
> >
> > [dst_gc_task] <dev name> (<protocol>): delayed <refcnt>
> >
> > I have the log attached.
> 
> The following line looks suspicious:
> 
> Feb  6 16:27:24 <hostname> kernel: [63589.458067] [dst_gc_task]
> lodebug (2): delayed 19
> 
> Looks like you ended up having one dst whose refcnt is 19 in GC,
> and this lasted for a rather long time for some reason.
> 
> It is hard to know if it is a refcnt leak even with your log, since there were
> 4K+ refcnt'ing happened on that dst...
> 
> Meanwhile, can you share your setup of your container? What network device
> do you use in your container? How is it connected to outside?
> 
> Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ