lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <56F5A2D9.7090700@universe-factory.net>
Date:	Fri, 25 Mar 2016 21:43:05 +0100
From:	Matthias Schiffer <mschiffer@...verse-factory.net>
To:	Andrew Collins <acollins@...dlepoint.com>
Cc:	netdev@...r.kernel.org, vfalico@...hat.com
Subject: Re: RESEND: Easily reproducible kernel panic due to netdev
 all_adj_list refcnt handling

On 02/23/2016 11:29 PM, Andrew Collins wrote:
> I'm running into a relatively easily reproducible kernel panic related to
> the all_adj_list handling for netdevs
> in recent kernels.
> 
> The following sequence of commands will reproduce the issue:
> 
> ip link add link eth0 name eth0.100 type vlan id 100
> ip link add link eth0 name eth0.200 type vlan id 200
> ip link add name testbr type bridge
> ip link set eth0.100 master testbr
> ip link set eth0.200 master testbr
> ip link add link testbr mac0 type macvlan
> ip link delete dev testbr
> 
> This creates an upper/lower tree of (excuse the poor ASCII art):
> 
>             /---eth0.100-eth0
> mac0-testbr-
>             \---eth0.200-eth0
> 
> When testbr is deleted, the all_adj_lists are walked, and eth0 is deleted
> twice from the mac0 list.
> Unfortunately, during setup in __netdev_upper_dev_link, only one reference
> to eth0 is added,
> so this results in the following panic trace:
> 
> [68235.234564] tried to remove device eth0 from mac0

Hi,
I got a similar report which looks like the same issue. Our setup is a bit
more complicated, it also involves batman-adv:

* 5 VLANs (eth0.2, eth0.3, eth0.100, eth0.101, eth0.102)
* batman-adv device bat0 is master of eth0.100, eth0.101, eth0.102
* Bridge br-wan is master of eth0.2
* Bridge br-client is master of bat0 and eth0.3
* macvlan device local-node on top of br-client

The setup is OpenWrt-based, which has a network config daemon which will in
some cases remove bridge ports/slaves when the corresponding devices lose
carrier (and I think br-client and bat0 get deleted in this case, not
completely sure about this). The crash occurs when eth0 goes down.

We've tried your patch, and it changes the symptoms a bit, but doesn't fix
the panic. I've attached kernel logs of the crash both before and after
applying the patch.

One note: I did not reproduce this issue myself, it was first reported in
[1], and then forwarded to the batman-adv issue tracker [2] by me.

Regards,
Matthias


[1] https://github.com/freifunk-gluon/gluon/issues/680
[2] https://www.open-mesh.org/issues/247

View attachment "dmesg-after.txt" of type "text/plain" (47588 bytes)

View attachment "dmesg-before.txt" of type "text/plain" (23920 bytes)

Download attachment "signature.asc" of type "application/pgp-signature" (820 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ