lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20140317101855.53e67d4a@nehalam.linuxnetplumber.net>
Date:	Mon, 17 Mar 2014 10:18:55 -0700
From:	Stephen Hemminger <stephen@...workplumber.org>
To:	Hannes Frederic Sowa <hannes@...essinduktion.org>
Cc:	netdev@...r.kernel.org
Subject: Re: [BUG] RTNL assert fail via addrconf_join_solicit

On Sat, 15 Mar 2014 17:04:13 +0100
Hannes Frederic Sowa <hannes@...essinduktion.org> wrote:

> On Fri, Mar 14, 2014 at 06:42:14PM -0700, Stephen Hemminger wrote:
> > When doing VRRP which uses macvlan and multicast, we see the following
> > kernel assertion error.  This is on 3.10.33 but looks like no changes
> > in this area in recent kernels.
> > 
> > 
> > [  541.030090] RTNL: assertion failed at net/core/dev.c (4496)
> > [  541.031143] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O 3.10.33-1-amd64-vyatta #1
> > [  541.031145] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
> > [  541.031146]  ffffffff8148a9f0 000000000000002f ffffffff813c98c1 ffff88007c4451f8
> > [  541.031148]  0000000000000000 0000000000000000 ffffffff813d3540 ffff88007fc03d18
> > [  541.031150]  0000880000000006 ffff88007c445000 ffffffffa0194160 0000000000000000
> > [  541.031152] Call Trace:
> > [  541.031153]  <IRQ>  [<ffffffff8148a9f0>] ? dump_stack+0xd/0x17
> > [  541.031180]  [<ffffffff813c98c1>] ? __dev_set_promiscuity+0x101/0x180
> > [  541.031183]  [<ffffffff813d3540>] ? __hw_addr_create_ex+0x60/0xc0
> > [  541.031185]  [<ffffffff813cfe1a>] ? __dev_set_rx_mode+0xaa/0xc0
> > [  541.031189]  [<ffffffff813d3a81>] ? __dev_mc_add+0x61/0x90
> > [  541.031198]  [<ffffffffa01dcf9c>] ? igmp6_group_added+0xfc/0x1a0 [ipv6]
> > [  541.031208]  [<ffffffff8111237b>] ? kmem_cache_alloc+0xcb/0xd0
> > [  541.031212]  [<ffffffffa01ddcd7>] ? ipv6_dev_mc_inc+0x267/0x300 [ipv6]
> > [  541.031216]  [<ffffffffa01c2fae>] ? addrconf_join_solict+0x2e/0x40 [ipv6]
> > [  541.031219]  [<ffffffffa01ba2e9>] ? ipv6_dev_ac_inc+0x159/0x1f0 [ipv6]
> > [  541.031223]  [<ffffffffa01c0772>] ? addrconf_join_anycast+0x92/0xa0 [ipv6]
> > [  541.031226]  [<ffffffffa01c311e>] ? __ipv6_ifa_notify+0x11e/0x1e0 [ipv6]
> > [  541.031229]  [<ffffffffa01c3213>] ? ipv6_ifa_notify+0x33/0x50 [ipv6]
> > [  541.031233]  [<ffffffffa01c36c8>] ? addrconf_dad_completed+0x28/0x100 [ipv6]
> > [  541.031241]  [<ffffffff81075c1d>] ? task_cputime+0x2d/0x50
> > [  541.031244]  [<ffffffffa01c38d6>] ? addrconf_dad_timer+0x136/0x150 [ipv6]
> > [  541.031247]  [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6]
> > [  541.031255]  [<ffffffff8105313a>] ? call_timer_fn.isra.22+0x2a/0x90
> > [  541.031258]  [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6]
> > [  541.031261]  [<ffffffff81053531>] ? run_timer_softirq+0x1a1/0x260
> > [  541.031267]  [<ffffffff810350cf>] ? kvm_clock_read+0x1f/0x30
> > [  541.031272]  [<ffffffff810132a5>] ? sched_clock+0x5/0x10
> > [  541.031274]  [<ffffffff81074bd5>] ? sched_clock_local+0x15/0x80
> > [  541.031276]  [<ffffffff8104d586>] ? __do_softirq+0xd6/0x1b0
> > [  541.031282]  [<ffffffff8149109c>] ? call_softirq+0x1c/0x30
> > [  541.031284]  [<ffffffff8100d835>] ? do_softirq+0x75/0xb0
> > [  541.031286]  [<ffffffff8104d7ed>] ? irq_exit+0xbd/0xc0
> > 
> > 
> > Also it looks like ipv6 anycast has same potential issue of changing
> > unicast filters without holding rtnl_lock.
> >  ipv6_ac_inc -> addrconf_join_solict ->  ipv6_dev_mc_inc
> 
> Hmm, that's quite difficult to resolve, I think.
> 
> Either we make the code paths not depend on RTNL lock or we need to
> defer the action somehow and issue those commands down to the hardware
> befor unlocking rtnl mutex (like netdev_run_todo).
> 


It gets nasty. DAD timer has to be changed to a work queue.
The problem is that you can't change device filters without holding RTNL.
The existing device drivers may reasonably assume that RTNL is held as a way
to block other changes to the hardware.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ