lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1454585550.3407.126.camel@gmail.com>
Date:	Thu, 04 Feb 2016 12:32:30 +0100
From:	Mike Galbraith <umgwanakikbuti@...il.com>
To:	Nikolay Borisov <kernel@...p.com>,
	"Linux-Kernel@...r. Kernel. Org" <linux-kernel@...r.kernel.org>
Cc:	Jiri Slaby <jslaby@...e.cz>, Oleg Nesterov <oleg@...hat.com>,
	tglx@...utronix.de,
	SiteGround Operations <operations@...eground.com>
Subject: Re: crash in 3.12.51 (likely in 3.12.52 as well) in timer code

On Wed, 2016-02-03 at 12:58 +0200, Nikolay Borisov wrote:
> Hello, 
> 
> I've observed the following crash on a machine running 3.12.51:
> 
> [2711471.041886] Modules linked in: xt_length xt_state xt_pkttype xt_dscp xt_multiport xt_set(O) ip_set_list_set(O) ip_set_hash_ip(O) ip_set(O) act_police cls_basic sch_ingress veth dm_snapshot netconsole openvswitch gre vxlan ip_tunnel nf_nat_ftp nf_conntrack_ftp xt_owner xt_conntrack iptable_mangle xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_CT nf_conntrack iptable_raw ip6table_filter ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 ext2 dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio dm_mirror dm_region_hash dm_log ipmi_devintf ipmi_si ipmi_msghandler i2c_i801 lpc_ich mfd_core shpchp ioapic ioatdma ses enclosure ixgbe dca
> [2711471.059208] CPU: 12 PID: 0 Comm: swapper/12 Tainted: G           O 3.12.51-clouder5 #2
> [2711471.059563] Hardware name: Supermicro PIO-628U-TR4T+-ST031/X10DRU-i+, BIOS 1.0c 03/23/2015
> [2711471.059919] task: ffff881fd31db870 ti: ffff881fd31ea000 task.ti: ffff881fd31ea000
> [2711471.060273] RIP: 0010:[]  [] detach_if_pending+0x48/0x100
> [2711471.060972] RSP: 0018:ffff883fff203bd0  EFLAGS: 00010002
> [2711471.061320] RAX: dead000000200200 RBX: ffffffffa018be20 RCX: 0000000000000074
> [2711471.061672] RDX: ffff883fd2e14638 RSI: ffff883fd2df8000 RDI: ffffffffa018be20
> [2711471.062025] RBP: ffff883fff203bf0 R08: 0000000000000000 R09: ffff881fff403700
> [2711471.062377] R10: ffffea00d4178f80 R11: 0000000000000000 R12: ffff883fd2df8000
> [2711471.062729] R13: 0000000000000000 R14: 0000000000000001 R15: ffff883fff203c88
> [2711471.063081] FS:  0000000000000000(0000) GS:ffff883fff200000(0000) knlGS:0000000000000000
> [2711471.063437] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [2711471.063787] CR2: 00007f7b9b4b2000 CR3: 000000380ddae000 CR4: 00000000001407e0
> [2711471.064143] Stack:
> [2711471.064483]  ffffffffa018be20 ffff883fd2df8000 0000000000000000 0000000000000000
> [2711471.065090]  ffff883fff203c30 ffffffff810978f1 0000000000000000 0000000000000082
> [2711471.065695]  ffff883fff203c88 ffffffffa018be00 ffff883fff203c88 0000000000000001
> [2711471.066301] Call Trace:
> [2711471.066643]   
> [2711471.066709] 
> [2711471.067112]  [] del_timer+0x41/0x70
> [2711471.067465]  [] try_to_grab_pending+0x121/0x1d0
> [2711471.067818]  [] mod_delayed_work_on+0x42/0xa0
> [2711471.068171]  [] set_timeout+0x3a/0x40 [ib_addr]
> [2711471.068523]  [] netevent_callback+0x2d/0x40 [ib_addr]
> [2711471.068879]  [] notifier_call_chain+0x54/0x80
> [2711471.069231]  [] __atomic_notifier_call_chain+0x2a/0x40
> [2711471.069584]  [] atomic_notifier_call_chain+0x16/0x20
> [2711471.069940]  [] call_netevent_notifiers+0x1b/0x20
> [2711471.070292]  [] neigh_update_notify+0x1e/0x40
> [2711471.070643]  [] neigh_timer_handler+0x116/0x270
> [2711471.070995]  [] ? neigh_periodic_work+0x270/0x270
> [2711471.071346]  [] call_timer_fn+0x49/0x160
> [2711471.079597]  [] run_timer_softirq+0x278/0x2e0
> [2711471.079948]  [] ? neigh_periodic_work+0x270/0x270
> [2711471.080301]  [] __do_softirq+0x137/0x2e0
> [2711471.080653]  [] call_softirq+0x1c/0x30
> [2711471.081006]  [] do_softirq+0x8d/0xc0
> [2711471.081356]  [] irq_exit+0x95/0xa0
> [2711471.081706]  [] smp_apic_timer_interrupt+0x4a/0x5a
> [2711471.082057]  [] apic_timer_interrupt+0x6f/0x80
> [2711471.082406]   
> [2711471.082472] 
> [2711471.082877]  [] ? mwait_idle+0x73/0x90
> [2711471.083227]  [] ? mwait_idle+0x6a/0x90
> [2711471.083577]  [] arch_cpu_idle+0x26/0x30
> [2711471.083929]  [] cpu_startup_entry+0xcb/0x2a0
> [2711471.084283]  [] start_secondary+0x1e9/0x250
> [2711471.084633] Code: 44 00 00 31 c0 41 89 d6 48 89 fb 48 8b 17 49 89 f4 48 85 d2 74 4a 8b 05 ff ad c0 00 85 c0 7f 6c 48 8b 43 08 45 84 f6 48 89 42 08 <48> 89 10 74 07 48 c7 03 00 00 00 00 48 b9 00 02 20 00 00 00 ad 
> [2711471.089662] RIP  [] detach_if_pending+0x48/0x100
> [2711471.090078]  RSP 
> 
> 
> Analysing the issue it seems what happens is that a neighbor timer 
> expires which in turn causes the subscribed ib_addr module to invoke 
> set_timeout which queues delayed work. However, it seems something has 
> already corrupted the timer_list since the crash actually occurs in the 
> inlined detach_timer inside detach_if_pending, here is annotated assembly: 
> 
> ------------[detach_timer]----------------------
> /home/projects/linux-stable/kernel/timer.c: 662
> 0xffffffff8109770d :      mov    rax,QWORD PTR [rbx+0x8]  ; rbx holds value of rdi = timer_list
> /home/projects/linux-stable/kernel/timer.c: 663
> 0xffffffff81097711 :      test   r14b,r14b 
> ----------[__list_del]----------------------
> /home/projects/linux-stable/include/linux/list.h: 88
> 0xffffffff81097714 :      mov    QWORD PTR [rdx+0x8],rax ; ffffffffa018be20
> /home/projects/linux-stable/include/linux/list.h: 89
> 0xffffffff81097718 :      mov    QWORD PTR [rax],rdx 
> ---------------[__list_del]----------------
> /home/projects/linux-stable/kernel/timer.c: 663
> 0xffffffff8109771b :      je     0xffffffff81097724  
> /home/projects/linux-stable/kernel/timer.c: 664
> 0xffffffff8109771d :      mov    QWORD PTR [rbx],0x0
> /home/projects/linux-stable/kernel/timer.c: 665
> 0xffffffff81097724 :      movabs rcx,0xdead000000200200
> ------------[end detach_timer]-------------
> 
> 
> It seems when the code tries to do prev->next = next in __list_del from detach_timer, 
> rax has a value of dead000000200200 (LIST_POISON2).
> 
> ffffffffa018be20 is the address of the timer_list passed to detach_timer which looks 
> like so: 
> 
> crash> struct timer_list ffffffffa018be20
> struct timer_list {
>   entry = {
>     next = 0xffff883fd2e14638, 
>     prev = 0xffff883fff223e60
>   }, 
>   expires = 4565976929, 
>   base = 0xffff883fd2e14002, 
>   function = 0xffffffff810a4f70 , 
>   data = 18446744072100560384, 
>   slack = -1
> }
> 
> So in this case the prev/next entries do not look like corrupted, whereas
> when manipulating the list inside detach_timer they do. This is really
> odd, any ideas how to further debug this?

Suspiciously similar to https://lkml.org/lkml/2016/2/4/247

	-Mike

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ