[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <56F5A2D9.7090700@universe-factory.net>
Date: Fri, 25 Mar 2016 21:43:05 +0100
From: Matthias Schiffer <mschiffer@...verse-factory.net>
To: Andrew Collins <acollins@...dlepoint.com>
Cc: netdev@...r.kernel.org, vfalico@...hat.com
Subject: Re: RESEND: Easily reproducible kernel panic due to netdev
all_adj_list refcnt handling
On 02/23/2016 11:29 PM, Andrew Collins wrote:
> I'm running into a relatively easily reproducible kernel panic related to
> the all_adj_list handling for netdevs
> in recent kernels.
>
> The following sequence of commands will reproduce the issue:
>
> ip link add link eth0 name eth0.100 type vlan id 100
> ip link add link eth0 name eth0.200 type vlan id 200
> ip link add name testbr type bridge
> ip link set eth0.100 master testbr
> ip link set eth0.200 master testbr
> ip link add link testbr mac0 type macvlan
> ip link delete dev testbr
>
> This creates an upper/lower tree of (excuse the poor ASCII art):
>
> /---eth0.100-eth0
> mac0-testbr-
> \---eth0.200-eth0
>
> When testbr is deleted, the all_adj_lists are walked, and eth0 is deleted
> twice from the mac0 list.
> Unfortunately, during setup in __netdev_upper_dev_link, only one reference
> to eth0 is added,
> so this results in the following panic trace:
>
> [68235.234564] tried to remove device eth0 from mac0
Hi,
I got a similar report which looks like the same issue. Our setup is a bit
more complicated, it also involves batman-adv:
* 5 VLANs (eth0.2, eth0.3, eth0.100, eth0.101, eth0.102)
* batman-adv device bat0 is master of eth0.100, eth0.101, eth0.102
* Bridge br-wan is master of eth0.2
* Bridge br-client is master of bat0 and eth0.3
* macvlan device local-node on top of br-client
The setup is OpenWrt-based, which has a network config daemon which will in
some cases remove bridge ports/slaves when the corresponding devices lose
carrier (and I think br-client and bat0 get deleted in this case, not
completely sure about this). The crash occurs when eth0 goes down.
We've tried your patch, and it changes the symptoms a bit, but doesn't fix
the panic. I've attached kernel logs of the crash both before and after
applying the patch.
One note: I did not reproduce this issue myself, it was first reported in
[1], and then forwarded to the batman-adv issue tracker [2] by me.
Regards,
Matthias
[1] https://github.com/freifunk-gluon/gluon/issues/680
[2] https://www.open-mesh.org/issues/247
View attachment "dmesg-after.txt" of type "text/plain" (47588 bytes)
View attachment "dmesg-before.txt" of type "text/plain" (23920 bytes)
Download attachment "signature.asc" of type "application/pgp-signature" (820 bytes)
Powered by blists - more mailing lists