netdev - Re: loopback device reference count leakage

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <BY1PR17MB0101812AA1E386BB56CBF13BA1440@BY1PR17MB0101.namprd17.prod.outlook.com>
Date:   Fri, 10 Feb 2017 05:45:36 +0000
From:   Kaiwen Xu <kevin@...xu.net>
To:     Cong Wang <xiyou.wangcong@...il.com>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: loopback device reference count leakage

I am using macvlan device inside the container. With following Docker
network plugin:

https://github.com/gopher-net/macvlan-docker-plugin

Each macvlan device, which gets assigned into the container network
namespace, is attached to host's vlan device, which is then attached to
host's eth0.

    eth0  <==  eth0.1000  <==  macvlan0 (host macvlan device)
                          \==  macvlan1 (container macvlan device)
                          \==  macvlan2 (container macvlan device)
                          ...

eth0 has a 10.x.x.x/24 IP address. eth0.1000 is able to use any of the
addresses in another 10.x.x.y/24 range (different from the /24 assigned to
eth0), but itself isn't directly assigned an IP address. macvlan0, which
is on the host, is assigned an IP address in the 10.x.x.y/24 range that
belongs to eth0.1000. When container start up, a new macvlan device is
created attaching to eth0.1000 with a different 10.x.x.y/24 address,
which is assigned into the container network namespace. The container's
10.x.x.y/24 address is directly reachable outside of the host.

Thanks,
Kaiwen

On Wed, Feb 08, 2017 at 01:50:57PM -0800, Cong Wang wrote:
> On Mon, Feb 6, 2017 at 6:32 PM, Kaiwen Xu <kevin@...xu.net> wrote:
> > Hi Cong,
> >
> > I did some more testing, seems like your second assumption is correct.
> > There is indeed some things holding the references to a particular dst
> > which preventing it to be gc'ed.
> 
> Excellent!
> 
> >
> > I added logging to each dst_hold (or dst_hold_safe, or
> > skb_dst_force_safe) and dst_release, which formatted as following:
> >
> > <dev name> (<protocol>) [<dst addr>]: dst_release / dst_hold ... <refcnt> <caller function>
> >
> > And inside dst_gc_task(), I added logging when gc delay occurred,
> > formatted as:
> >
> > [dst_gc_task] <dev name> (<protocol>): delayed <refcnt>
> >
> > I have the log attached.
> 
> The following line looks suspicious:
> 
> Feb  6 16:27:24 <hostname> kernel: [63589.458067] [dst_gc_task]
> lodebug (2): delayed 19
> 
> Looks like you ended up having one dst whose refcnt is 19 in GC,
> and this lasted for a rather long time for some reason.
> 
> It is hard to know if it is a refcnt leak even with your log, since there were
> 4K+ refcnt'ing happened on that dst...
> 
> Meanwhile, can you share your setup of your container? What network device
> do you use in your container? How is it connected to outside?
> 
> Thanks.