netdev - RFC: crash in fib6_clean_all() while loading ipv6 module

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 9 Sep 2013 12:05:15 +0200
From:	Michal Kubecek <mkubecek@...e.cz>
To:	netdev@...r.kernel.org
Subject: RFC: crash in fib6_clean_all() while loading ipv6 module

Hello,

Two of our customers encountered a crash in fib6_clean_all() when
booting their system:

[   12.408421] BUG: unable to handle kernel NULL pointer dereference at        (null)
[   12.408424] IP: [<ffffffffa02566b4>] fib6_clean_all+0x34/0xd0 [ipv6_lib]
[   12.408434] PGD c3590f067 PUD c29073067 PMD 0 
[   12.408436] Oops: 0000 [#1] SMP 
[   12.408439] CPU 1 
[   12.408440] Modules linked in: processor(+) ipv6(+) thermal_sys ipv6_lib
ahci(+) ixgbe(+) ehci_hcd libahci hwmon igb(+) libata usbcore i2c_i801 dca
iTCO_wdt pcspkr scsi_mod e1000e i2c_core usb_common iTCO_vendor_support
rtc_cmos container ptp pps_core button mdio
[   12.408449] Supported: Yes
[   12.408449] 
[   12.408451] Pid: 211, comm: modprobe Not tainted 3.0.76-0.11-default #1 ...
[   12.408453] RIP: 0010:[<ffffffffa02566b4>]  [<ffffffffa02566b4>] fib6_clean_all+0x34/0xd0 [ipv6_lib]
[   12.408460] RSP: 0018:ffff880c1e983b98  EFLAGS: 00010246
[   12.408462] RAX: 0000000000000000 RBX: ffffffffffffffff RCX: 0000000000000000
[   12.408463] RDX: 0000000000000000 RSI: ffffffffa0254f90 RDI: ffffffff81a72280
[   12.408464] RBP: ffffffff81a72280 R08: 0000000000000000 R09: 000000003520aec8
[   12.408466] R10: 000000005f000c28 R11: 000000003520aec8 R12: 0000000000000000
[   12.408467] R13: ffffffff81a72280 R14: 0000000000000000 R15: ffffffffa0254f90
[   12.408469] FS:  00007fb2039f1700(0000) GS:ffff880c7f040000(0000) knlGS:0000000000000000
[   12.408470] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   12.408471] CR2: 0000000000000000 CR3: 0000000c2904e000 CR4: 00000000001407e0
[   12.408473] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   12.408474] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   12.408476] Process modprobe (pid: 211, threadinfo ffff880c1e982000, task ffff880c3242c540)
[   12.408477] Stack:
[   12.408478]  0000000000000000 0000000000000000 0000000000000000 ffffffff8146544e
[   12.408480]  000000000006a9d0 0000000000000000 000000003520aec8 000000005f000c28
[   12.408483]  000000003520aec8 0000000000000000 0000000000000000 0000000000000000
[   12.408485] Call Trace:
[   12.408507]  [<ffffffffa02567a3>] fib6_run_gc+0x53/0xf0 [ipv6_lib]
[   12.408524]  [<ffffffffa025c5c6>] ndisc_netdev_event+0x186/0x190 [ipv6_lib]
[   12.408541]  [<ffffffff81460437>] notifier_call_chain+0x37/0x70
[   12.408547]  [<ffffffff813a6e52>] dev_addr_add+0x52/0x90
[   12.408556]  [<ffffffffa032e3e9>] ixgbe_probe+0x8c9/0xd10 [ixgbe]
[   12.408565]  [<ffffffff8127d5d4>] local_pci_probe+0x54/0xe0
[   12.408570]  [<ffffffff8127d740>] __pci_device_probe+0xe0/0xf0
[   12.408573]  [<ffffffff8127e953>] pci_device_probe+0x33/0x60
[   12.408578]  [<ffffffff8132ddda>] really_probe+0x7a/0x260
[   12.408581]  [<ffffffff8132e023>] driver_probe_device+0x63/0xc0
[   12.408585]  [<ffffffff8132e113>] __driver_attach+0x93/0xa0
[   12.408588]  [<ffffffff8132d468>] bus_for_each_dev+0x88/0xb0
[   12.408591]  [<ffffffff8132cbb5>] bus_add_driver+0x155/0x2a0
[   12.408595]  [<ffffffff8132e769>] driver_register+0x79/0x170
[   12.408599]  [<ffffffff8127ebe8>] __pci_register_driver+0x58/0xe0
[   12.408604]  [<ffffffff810001cb>] do_one_initcall+0x3b/0x180
[   12.408610]  [<ffffffff810a02df>] sys_init_module+0xcf/0x240
[   12.408615]  [<ffffffff81464592>] system_call_fastpath+0x16/0x1b

The crash is caused by fib6_clean_all() dereferencing a null pointer in
init_net.ipv6.fib_table_hash because ndisc_netdev_event() handler which
calls fib6_run_gc() is registered by ndisc_init() but the relevant data
structures aren't initialized until ip6_route_init() is called. If a
device intialization falls into this window, it emits NETDEV_CHANGEADDR,
leading to a crash.

This could be prevented by setting a flag when ip6_route_init() is
complete and not calling fib6_run_gc() from ndisc_netdev_event() until
the flag is set. However, I don't like the idea of adding a test which
will be useful only in a short window while loading ipv6 module.

The only other actions ndisc_netdev_event() does are flushing the neigh
table (which should be empty at the moment) and calling
ndisc_send_unsol_na() (which should do nothing as there is no IPv6
address assigned yet). Thus I believe taking the netdev event handler
registration call out of ndisc_init() into a separate function, say
ndisc_late_init(), which would be called after ip6_route_init() should
be a safe and efficient way to avoid the race condition.

Before I submit the patch, I wanted to ask if someone can see a problem
with this solution (or a better solution).

                                                         Michal Kubecek

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html