lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 23 Jan 2012 15:56:38 -0500
From:	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
To:	Suresh Siddha <suresh.b.siddha@...el.com>, a.p.zijlstra@...llo.nl,
	tglx@...utronix.de, mingo@...e.hu, linux-kernel@...r.kernel.org
Cc:	xen-devel@...ts.xensource.com, gregkh@...e.de, rjw@...k.pl
Subject: v3.3-rc1, regression introduced by "sched, nohz: Implement sched
 group, domain aware nohz idle load balancing" when unplugging CPUs.

Hey,

Not exactly sure how this patch does it, but with this git commit
0b005cf54eac170a8f22540ab096a6e07bf49e7c, the Linux kernel crashes
if I try to hot unplug VCPUs to the first (initial) domain.
This is found using git bisection, and if I use the kernel compiled
with 69e1e811dcc436a6b129dbef273ad9ec22d095ce (the previous commit)
it works nicely.
 
I am not really sure if xen_send_IPI_one needs to be updated, but
it looks as if an IPI to a non-existed (torn-down) CPU is sent.. Hmm.

The VCPU unplug mechanism uses the arch_unregister_cpu, so I think
this can also be reproduced by doing ACPI CPU hotplug on baremetal.

The steps to reproduce this are quite easy.

sh-4.1# uname -a
Linux tst018.dumpdata.com 3.2.0-rc1-00328-g0b005cf #1 SMP PREEMPT Mon Jan 23 15:34:43 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
sh-4.1# xl vcpu-list
Name                                ID  VCPU   CPU State   Time(s) CPU Affinity
Domain-0                             0     0    0   -b-       5.0  any cpu
Domain-0                             0     1    1   -b-       1.3  any cpu
Domain-0                             0     2    2   -b-       1.6  any cpu
Domain-0                             0     3    3   r--       2.0  any cpu
sh-4.1# xl vcpu-set 0 2
sh-4.1# [  123.856084] ------------[ cut here ]------------
[  123.857166] kernel BUG at /home/konrad/ssd/linux/drivers/xen/events.c:1071!
[  123.858265] invalid opcode: 0000 [#1] PREEMPT SMP 
[  123.859387] CPU 1 
[  123.859400] Modules linked in: dm_multipath dm_mod xen_evtchn iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi libcrc32c crc32c sg sd_mod usbhid hid usb_storage nouveau ahci libahci ata_generic libata i915 fbcon ttm tileblit scsi_mod font mxm_wmi bitblit e1000e softcursor wmi drm_kms_helper video xen_blkfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea xenfs
[  123.864413] 
[  123.865679] Pid: 2568, comm: kworker/u:7 Not tainted 3.2.0-rc1-00328-g0b005cf #1                  /DQ67SW
[  123.867010] RIP: e030:[<ffffffff8138a81e>]  [<ffffffff8138a81e>] xen_send_IPI_one+0x2e/0x40
[  123.868352] RSP: e02b:ffff8803e2ea3c18  EFLAGS: 00010086
[  123.869688] RAX: 0000000000010980 RBX: 0000000000000001 RCX: 0000000000000002
[  123.871051] RDX: ffff8803e2ebc000 RSI: 0000000000000000 RDI: 00000000ffffffff
[  123.872407] RBP: ffff8803e2ea3c18 R08: 0000000000000000 R09: 0000000000000001
[  123.873768] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8803e2eb3800
[  123.875115] R13: 00000000fffd338f R14: ffff8803e2eb3800 R15: 0000000000000001
[  123.876458] FS:  00007fd00c8a4700(0000) GS:ffff8803e2ea0000(0000) knlGS:0000000000000000
[  123.877806] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[  123.879169] CR2: 00007fd00c8a2000 CR3: 00000003bbd2c000 CR4: 0000000000002660
[  123.880538] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  123.881900] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  123.883258] Process kworker/u:7 (pid: 2568, threadinfo ffff8803c39ce000, task ffff8803cc753d20)
[  123.884626] Stack:
[  123.885980]  ffff8803e2ea3c28 ffffffff81049d70 ffff8803e2ea3c78 ffffffff810c69b0
[  123.887376]  0000000000000001 00000002cc753d68 ffff8803e2ea3c78 ffff8803e2eb3800
[  123.888759]  0000000000000001 0000000000000001 ffff8803e2eb3800 ffff8803cc753d20
[  123.890136] Call Trace:
[  123.891455]  <IRQ> 
[  123.892763]  [<ffffffff81049d70>] xen_smp_send_reschedule+0x10/0x20
[  123.894085]  [<ffffffff810c69b0>] trigger_load_balance+0x260/0x330
[  123.895392]  [<ffffffff810bc044>] scheduler_tick+0x104/0x160
[  123.896691]  [<ffffffff8109a66e>] update_process_times+0x6e/0x90
[  123.897980]  [<ffffffff810d97c2>] tick_sched_timer+0x62/0xc0
[  123.899257]  [<ffffffff810b3766>] __run_hrtimer+0x96/0x280
[  123.900539]  [<ffffffff810d9760>] ? tick_nohz_handler+0x100/0x100
[  123.901846]  [<ffffffff810b3be6>] hrtimer_interrupt+0x106/0x240
[  123.903165]  [<ffffffff81042398>] xen_timer_interrupt+0x38/0x1f0
[  123.904478]  [<ffffffff810919bb>] ? irq_exit+0x7b/0x100
[  123.905780]  [<ffffffff8110eeed>] handle_irq_event_percpu+0x8d/0x290
[  123.907081]  [<ffffffff81112238>] handle_percpu_irq+0x48/0x70
[  123.908359]  [<ffffffff813891b1>] __xen_evtchn_do_upcall+0x1c1/0x2c0
[  123.909631]  [<ffffffff8138947f>] xen_evtchn_do_upcall+0x2f/0x50
[  123.910898]  [<ffffffff8164677e>] xen_do_hypervisor_callback+0x1e/0x30
[  123.912150]  <EOI> 
[  123.913384]  [<ffffffff8100122a>] ? hypercall_page+0x22a/0x1000
[  123.914627]  [<ffffffff8100122a>] ? hypercall_page+0x22a/0x1000
[  123.915847]  [<ffffffff81041e1d>] ? xen_force_evtchn_callback+0xd/0x10
[  123.917067]  [<ffffffff81042802>] ? check_events+0x12/0x20
[  123.918282]  [<ffffffff810427a9>] ? xen_irq_enable_direct_reloc+0x4/0x4
[  123.919508]  [<ffffffff8163cd6b>] ? _raw_spin_unlock_irq+0x2b/0x70
[  123.920718]  [<ffffffff810bc53e>] ? finish_task_switch+0x4e/0xe0
[  123.921913]  [<ffffffff8163b669>] ? __schedule+0x469/0x890
[  123.923103]  [<ffffffff8163bb6f>] ? schedule+0x3f/0x60
[  123.924285]  [<ffffffff816399ad>] ? schedule_timeout+0x1fd/0x350
[  123.925466]  [<ffffffff8104259c>] ? xen_clocksource_read+0x4c/0x80
[  123.926645]  [<ffffffff810c57f4>] ? update_curr+0x144/0x1e0
[  123.927816]  [<ffffffff8104a8c6>] ? xen_spin_lock+0xa6/0x110
[  123.928974]  [<ffffffff810bb491>] ? get_parent_ip+0x11/0x50
[  123.930117]  [<ffffffff8163aff0>] ? wait_for_common+0xd0/0x190
[  123.931262]  [<ffffffff810c0c20>] ? try_to_wake_up+0x2c0/0x2c0
[  123.932367]  [<ffffffff8163b18d>] ? wait_for_completion+0x1d/0x20
[  123.933427]  [<ffffffff81089eb9>] ? do_fork+0xe9/0x350
[  123.934440]  [<ffffffff810a5640>] ? call_usermodehelper_exec+0xe0/0xe0
[  123.935465]  [<ffffffff810557d6>] ? kernel_thread+0x76/0x80
[  123.936473]  [<ffffffff810a5290>] ? call_usermodehelper_setup+0xa0/0xa0
[  123.937471]  [<ffffffff81646630>] ? gs_change+0x13/0x13
[  123.938454]  [<ffffffff816409ad>] ? sub_preempt_count+0x9d/0xd0
[  123.939428]  [<ffffffff810a5677>] ? __call_usermodehelper+0x37/0xb0
[  123.940411]  [<ffffffff810a7b59>] ? process_one_work+0x129/0x4e0
[  123.941400]  [<ffffffff810a9c4e>] ? worker_thread+0x17e/0x410
[  123.942383]  [<ffffffff810a9ad0>] ? manage_workers+0x210/0x210
[  123.943363]  [<ffffffff810ae906>] ? kthread+0x96/0xa0
[  123.944327]  [<ffffffff81646634>] ? kernel_thread_helper+0x4/0x10
[  123.945287]  [<ffffffff816446e3>] ? int_ret_from_sys_call+0x7/0x1b
[  123.946238]  [<ffffffff8163d200>] ? retint_restore_args+0x5/0x6
[  123.947187]  [<ffffffff81646630>] ? gs_change+0x13/0x13
[  123.948132] Code: e5 66 66 66 66 90 48 c7 c0 80 09 01 00 89 ff 89 f6 48 8b 14 fd e0 28 ac 81 48 8d 04 b0 8b 3c 10 85 ff 78 07 e8 74 ff ff ff c9 c3 <0f> 0b eb fe 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 
[  123.950401] RIP  [<ffffffff8138a81e>] xen_send_IPI_one+0x2e/0x40
[  123.951419]  RSP <ffff8803e2ea3c18>
[  123.952425] ---[ end trace 4c21b5ae5c292a38 ]---
[  123.953438] Kernel panic - not syncing: Fatal exception in interrupt
[  123.954459] Pid: 2568, comm: kworker/u:7 Tainted: G      D      3.2.0-rc1-00328-g0b005cf #1
[  123.955508] Call Trace:
[  123.956539]  <IRQ>  [<ffffffff816394e2>] panic+0x9b/0x1c9
[  123.957592]  [<ffffffff81042802>] ? check_events+0x12/0x20
[  123.958644]  [<ffffffff8163df8a>] oops_end+0x10a/0x120
[  123.959694]  [<ffffffff8104fcbb>] die+0x5b/0x90
[  123.960736]  [<ffffffff8163d8c4>] do_trap+0xc4/0x170
[  123.961774]  [<ffffffff8104d906>] do_invalid_op+0xa6/0xc0
[  123.962813]  [<ffffffff8138a81e>] ? xen_send_IPI_one+0x2e/0x40
[  123.963850]  [<ffffffff810c510b>] ? find_busiest_group+0x9bb/0xac0
[  123.964890]  [<ffffffff816464ab>] invalid_op+0x1b/0x20
[  123.965929]  [<ffffffff8138a81e>] ? xen_send_IPI_one+0x2e/0x40
[  123.966967]  [<ffffffff81049d70>] xen_smp_send_reschedule+0x10/0x20
[  123.968009]  [<ffffffff810c69b0>] trigger_load_balance+0x260/0x330
[  123.969049]  [<ffffffff810bc044>] scheduler_tick+0x104/0x160
[  123.970086]  [<ffffffff8109a66e>] update_process_times+0x6e/0x90
[  123.971119]  [<ffffffff810d97c2>] tick_sched_timer+0x62/0xc0
[  123.972148]  [<ffffffff810b3766>] __run_hrtimer+0x96/0x280
[  123.973167]  [<ffffffff810d9760>] ? tick_nohz_handler+0x100/0x100
[  123.974203]  [<ffffffff810b3be6>] hrtimer_interrupt+0x106/0x240
[  123.975238]  [<ffffffff81042398>] xen_timer_interrupt+0x38/0x1f0
[  123.976274]  [<ffffffff810919bb>] ? irq_exit+0x7b/0x100
[  123.977308]  [<ffffffff8110eeed>] handle_irq_event_percpu+0x8d/0x290
[  123.978344]  [<ffffffff81112238>] handle_percpu_irq+0x48/0x70
[  123.979379]  [<ffffffff813891b1>] __xen_evtchn_do_upcall+0x1c1/0x2c0
[  123.980422]  [<ffffffff8138947f>] xen_evtchn_do_upcall+0x2f/0x50
[  123.981465]  [<ffffffff8164677e>] xen_do_hypervisor_callback+0x1e/0x30
[  123.982517]  <EOI>  [<ffffffff8100122a>] ? hypercall_page+0x22a/0x1000
[  123.983584]  [<ffffffff8100122a>] ? hypercall_page+0x22a/0x1000
[  123.984652]  [<ffffffff81041e1d>] ? xen_force_evtchn_callback+0xd/0x10
[  123.985721]  [<ffffffff81042802>] ? check_events+0x12/0x20
[  123.986792]  [<ffffffff810427a9>] ? xen_irq_enable_direct_reloc+0x4/0x4
[  123.987869]  [<ffffffff8163cd6b>] ? _raw_spin_unlock_irq+0x2b/0x70
[  123.988948]  [<ffffffff810bc53e>] ? finish_task_switch+0x4e/0xe0
[  123.990027]  [<ffffffff8163b669>] ? __schedule+0x469/0x890
[  123.991106]  [<ffffffff8163bb6f>] ? schedule+0x3f/0x60
[  123.992176]  [<ffffffff816399ad>] ? schedule_timeout+0x1fd/0x350
[  123.993244]  [<ffffffff8104259c>] ? xen_clocksource_read+0x4c/0x80
[  123.994308]  [<ffffffff810c57f4>] ? update_curr+0x144/0x1e0
[  123.995370]  [<ffffffff8104a8c6>] ? xen_spin_lock+0xa6/0x110
[  123.996429]  [<ffffffff810bb491>] ? get_parent_ip+0x11/0x50
[  123.997489]  [<ffffffff8163aff0>] ? wait_for_common+0xd0/0x190
[  123.998545]  [<ffffffff810c0c20>] ? try_to_wake_up+0x2c0/0x2c0
[  123.999600]  [<ffffffff8163b18d>] ? wait_for_completion+0x1d/0x20
[  124.000660]  [<ffffffff81089eb9>] ? do_fork+0xe9/0x350
[  124.001715]  [<ffffffff810a5640>] ? call_usermodehelper_exec+0xe0/0xe0
[  124.002781]  [<ffffffff810557d6>] ? kernel_thread+0x76/0x80
[  124.003847]  [<ffffffff810a5290>] ? call_usermodehelper_setup+0xa0/0xa0
[  124.004914]  [<ffffffff81646630>] ? gs_change+0x13/0x13
[  124.005982]  [<ffffffff816409ad>] ? sub_preempt_count+0x9d/0xd0
[  124.007009]  [<ffffffff810a5677>] ? __call_usermodehelper+0x37/0xb0
[  124.007991]  [<ffffffff810a7b59>] ? process_one_work+0x129/0x4e0
[  124.008965]  [<ffffffff810a9c4e>] ? worker_thread+0x17e/0x410
[  124.009923]  [<ffffffff810a9ad0>] ? manage_workers+0x210/0x210
[  124.010882]  [<ffffffff810ae906>] ? kthread+0x96/0xa0
[  124.011830]  [<ffffffff81646634>] ? kernel_thread_helper+0x4/0x10
[  124.012765]  [<ffffffff816446e3>] ? int_ret_from_sys_call+0x7/0x1b
[  124.013684]  [<ffffffff8163d200>] ? retint_restore_args+0x5/0x6
[  124.014603]  [<ffffffff81646630>] ? gs_change+0x13/0x13
(XEN) Domain 0 crashed: rebooting machine in 5 seconds.
amtterm: RUN_SOL -> ERROR (failure)
amtterm: ERROR: redir_data: unknown r->buf 0x29

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ