lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 31 Jul 2014 15:56:02 +0800
From:	Lai Jiangshan <laijs@...fujitsu.com>
To:	Fengguang Wu <fengguang.wu@...el.com>
CC:	Christoph Lameter <cl@...ux-foundation.org>,
	Tejun Heo <tj@...nel.org>, Jet Chen <jet.chen@...el.com>,
	Su Tao <tao.su@...el.com>, Yuanhan Liu <yuanhan.liu@...el.com>,
	LKP <lkp@...org>, <linux-kernel@...r.kernel.org>
Subject: Re: [scheduler] BUG: unable to handle kernel paging request at 000000000000ce50

On 07/30/2014 09:56 PM, Fengguang Wu wrote:
> Hi Christoph,
> 
> FYI, this commit seems to convert some kernel boot hang bug into
> different BUG messages.
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu.git for-3.17-consistent-ops
> commit 9b0c63851edaf54e909475fe2a0946f57810e98a
> Author:     Christoph Lameter <cl@...ux.com>
> AuthorDate: Fri Jun 20 14:31:18 2014 -0500
> Commit:     Tejun Heo <tj@...nel.org>
> CommitDate: Fri Jul 18 19:21:39 2014 -0400
> 
>     scheduler: Replace __get_cpu_var with this_cpu_ptr
>     
>     Convert all uses of __get_cpu_var for address calculation to use
>     this_cpu_ptr instead.


-	struct cpumask *cpus = __get_cpu_var(load_balance_mask);
+	struct cpumask *cpus = this_cpu_ptr(load_balance_mask);


I think the conversion is wrong. it should be
			*this_cpu_ptr(&load_balance_mask);

there are several such mistakes in the patch.

>     
>     Cc: Peter Zijlstra <peterz@...radead.org>
>     Acked-by: Ingo Molnar <mingo@...nel.org>
>     Signed-off-by: Christoph Lameter <cl@...ux.com>
>     Signed-off-by: Tejun Heo <tj@...nel.org>
> 
> ===================================================
> PARENT COMMIT NOT CLEAN. LOOK OUT FOR WRONG BISECT!
> ===================================================
> Attached dmesg for the parent commit, too, to help confirm whether it is a noise error.
> 
> +-----------------------------------------------------------+------------+------------+------------+
> |                                                           | 9dfcba84af | 9b0c63851e | e65347f54c |
> +-----------------------------------------------------------+------------+------------+------------+
> | boot_successes                                            | 1058       | 129        | 38         |
> | boot_failures                                             | 302        | 231        | 3          |
> | BUG:kernel_boot_hang                                      | 302        |            |            |
> | BUG:unable_to_handle_kernel_paging_request                | 0          | 230        | 3          |
> | Oops                                                      | 0          | 230        | 3          |
> | RIP:load_balance                                          | 0          | 230        | 3          |
> | backtrace:__alloc_workqueue_key                           | 0          | 214        | 3          |
> | backtrace:usermodehelper_init                             | 0          | 214        | 3          |
> | backtrace:kernel_init_freeable                            | 0          | 214        | 3          |
> | backtrace:schedule                                        | 0          | 16         |            |
> | backtrace:smpboot_thread_fn                               | 0          | 2          |            |
> | kernel_BUG_at_kernel/smpboot.c                            | 0          | 1          |            |
> | invalid_opcode                                            | 0          | 1          |            |
> | RIP:smpboot_thread_fn                                     | 0          | 1          |            |
> | Kernel_panic-not_syncing:Attempted_to_kill_init_exitcode= | 0          | 1          |            |
> +-----------------------------------------------------------+------------+------------+------------+
> 
> [    0.260658] Good, all   2 testcases passed! |
> [    0.261298] ---------------------------------
> [    0.261951] smpboot: Total of 2 processors activated (10773.32 BogoMIPS)
> [    0.263759] BUG: unable to handle kernel paging request at 000000000000ce50
> [    0.263759] IP: [<ffffffff8110d4e8>] load_balance+0x48/0xce0
> [    0.263759] PGD 0 
> [    0.263759] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
> [    0.263759] Modules linked in:
> [    0.263777] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.16.0-rc5-00154-g9b0c638 #2
> [    0.264811] task: ffff880000188000 ti: ffff88000018c000 task.ti: ffff88000018c000
> [    0.265805] RIP: 0010:[<ffffffff8110d4e8>]  [<ffffffff8110d4e8>] load_balance+0x48/0xce0
> [    0.267010] RSP: 0000:ffff88000018fa18  EFLAGS: 00010002
> [    0.267856] RAX: 0000000000000000 RBX: ffff88000020d7a0 RCX: 0000000000000002
> [    0.269009] RDX: ffff88000020d7a0 RSI: ffff8800123d1840 RDI: 0000000000000000
> [    0.270000] RBP: ffff88000018faf8 R08: ffff88000018fb3c R09: 0000000000000001
> [    0.270000] R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000000
> [    0.270000] R13: 00000000ffff8b4e R14: 0000000000000000 R15: ffff88000020d7a0
> [    0.270000] FS:  0000000000000000(0000) GS:ffff880012200000(0000) knlGS:0000000000000000
> [    0.270000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.270000] CR2: 000000000000ce50 CR3: 0000000001f2f000 CR4: 00000000000406b0
> [    0.270000] Stack:
> [    0.270000]  ffff88000018fb3c 0000000200188710 ffff88000018fa38 0000000000000000
> [    0.270000]  ffff88000020d7a0 ffffffff00000000 ffff880000188000 0000000000000000
> [    0.270000]  ffff88000018fa90 0000000000000002 0000000000000006 ffff8800123d1840
> [    0.270000] Call Trace:
> [    0.270000]  [<ffffffff81048f85>] ? kvm_clock_read+0x35/0x50
> [    0.270000]  [<ffffffff81010c80>] ? sched_clock+0x10/0x20
> [    0.270000]  [<ffffffff810ff564>] ? sched_clock_local+0x64/0xe0
> [    0.270000]  [<ffffffff8110eebe>] pick_next_task_fair+0x50e/0xb30
> [    0.270000]  [<ffffffff8110ece0>] ? pick_next_task_fair+0x330/0xb30
> [    0.270000]  [<ffffffff81a2f402>] __schedule+0x1e2/0xca0
> [    0.270000]  [<ffffffff81a303fc>] schedule+0x1c/0x30
> [    0.270000]  [<ffffffff81a2ec4c>] schedule_timeout+0x1fc/0x260
> [    0.270000]  [<ffffffff810ff95f>] ? sched_clock_cpu+0x10f/0x140
> [    0.270000]  [<ffffffff810ff9c2>] ? local_clock+0x32/0x60
> [    0.270000]  [<ffffffff81a37c5a>] ? _raw_spin_unlock_irq+0x4a/0x80
> [    0.270000]  [<ffffffff81125a04>] ? trace_hardirqs_on_caller+0x1f4/0x2c0
> [    0.270000]  [<ffffffff81a31836>] wait_for_completion_killable+0x116/0x230
> [    0.270000]  [<ffffffff810fb080>] ? try_to_wake_up+0x5c0/0x5c0
> [    0.270000]  [<ffffffff810d9aa0>] ? process_one_work+0x6d0/0x6d0
> [    0.270000]  [<ffffffff810e59de>] kthread_create_on_node+0x13e/0x240
> [    0.270000]  [<ffffffff810ff95f>] ? sched_clock_cpu+0x10f/0x140
> [    0.270000]  [<ffffffff81a31774>] ? wait_for_completion_killable+0x54/0x230
> [    0.270000]  [<ffffffff81125a04>] ? trace_hardirqs_on_caller+0x1f4/0x2c0
> [    0.270000]  [<ffffffff810ddec7>] __alloc_workqueue_key+0x717/0x940
> [    0.270000]  [<ffffffff8133eb3f>] ? alloc_cpumask_var_node+0x4f/0xa0
> [    0.270000]  [<ffffffff8133ebf6>] ? zalloc_cpumask_var_node+0x16/0x20
> [    0.270000]  [<ffffffff82541860>] ? sched_init_smp+0x51d/0x533
> [    0.270000]  [<ffffffff8253fc2f>] usermodehelper_init+0x38/0x5d
> [    0.270000]  [<ffffffff82523911>] kernel_init_freeable+0x249/0x427
> [    0.270000]  [<ffffffff81a1fe50>] ? kernel_init+0x10/0x190
> [    0.270000]  [<ffffffff81a1fe40>] ? rest_init+0x220/0x220
> [    0.270000]  [<ffffffff81a1fe50>] kernel_init+0x10/0x190
> [    0.270000]  [<ffffffff81a391fc>] ret_from_fork+0x7c/0xb0
> [    0.270000]  [<ffffffff81a1fe40>] ? rest_init+0x220/0x220
> [    0.270000] Code: 48 ff 05 7c dd 57 01 89 bd 58 ff ff ff 48 8b 02 48 89 95 40 ff ff ff 89 8d 2c ff ff ff 4c 89 85 20 ff ff ff 48 89 85 38 ff ff ff <48> 8b 05 61 f9 ef 7e 65 48 03 04 25 18 ca 00 00 4c 8d 6d 80 48 
> [    0.270000] RIP  [<ffffffff8110d4e8>] load_balance+0x48/0xce0
> [    0.270000]  RSP <ffff88000018fa18>
> [    0.270000] CR2: 000000000000ce50
> [    0.270000] ---[ end trace e47ac2652bc5a17c ]---
> [    0.270000] ---[ end trace e47ac2652bc5a17c ]---
> 
> git bisect start e65347f54cfc1a17a3b734a0e268433dad019f3f 1795cd9b3a91d4b5473c97f491d63892442212ab --
> git bisect  bad 5a346c7c81b1e10381e5790134b79b4e6fb4434a  # 11:00      0-     72  Merge 'pm/bleeding-edge' into devel-lkp-hsx01-x86_64-201407191600
> git bisect  bad 8024b4314b39f7d45c621a6492a6b49078f8da5a  # 11:00    120-      2  Merge 'percpu/for-3.17-consistent-ops' into devel-lkp-hsx01-x86_64-201407191600
> git bisect good deebbfe3e05e145d25b065a792b3f57436ea9e06  # 11:10    360+     51  0day base guard for 'devel-lkp-hsx01-x86_64-201407191600'
> git bisect good d672f939bc81513d28a5bfc570ed2f17d8f5b34a  # 11:31    360+     16  Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
> git bisect good d14aef3872bd25af5355a10ad5235556ac83fcfd  # 11:50    360+     75  Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect  bad 6b233d1fb6da79d7bf86e0cb7c03e56ef7c6d39b  # 11:53      0-     14  drivers/cpuidle: Replace __get_cpu_var uses for address calculation
> git bisect good 22d368544b0ed9093a3db3ee4e00a842540fcecd  # 12:15    360+     69  Merge tag 'trace-fixes-v3.16-rc5-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
> git bisect good 9dfcba84af450d8685e3b7af9eea98bf1bea5b1e  # 12:22    360+    157  kernel misc: Replace __get_cpu_var uses
> git bisect  bad 2c20d34275287784397fdeb995c9686f3208fc5e  # 12:24      0-     10  block: Replace __this_cpu_ptr with raw_cpu_ptr
> git bisect  bad 9b0c63851edaf54e909475fe2a0946f57810e98a  # 12:27      1-     71  scheduler: Replace __get_cpu_var with this_cpu_ptr
> # first bad commit: [9b0c63851edaf54e909475fe2a0946f57810e98a] scheduler: Replace __get_cpu_var with this_cpu_ptr
> git bisect good 9dfcba84af450d8685e3b7af9eea98bf1bea5b1e  # 13:48   1000+    302  kernel misc: Replace __get_cpu_var uses
> git bisect  bad e65347f54cfc1a17a3b734a0e268433dad019f3f  # 13:48      0-      3  0day head guard for 'devel-lkp-hsx01-x86_64-201407191600'
> git bisect good f83971912231fe5390d2357442b6c25bb8076d9b  # 13:57   1000+    262  Merge tag 'gfs2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes
> git bisect good 58e323c3ee94f1abcecdeeef211a27d1c106c2b3  # 14:10   1000+    100  Add linux-next specific files for 20140718
> 
> 
> This script may reproduce the error.
> 
> ----------------------------------------------------------------------------
> #!/bin/bash
> 
> kernel=$1
> 
> kvm=(
> 	qemu-system-x86_64
> 	-enable-kvm
> 	-cpu Haswell,+smep,+smap
> 	-kernel $kernel
> 	-m 320
> 	-smp 2
> 	-net nic,vlan=1,model=e1000
> 	-net user,vlan=1
> 	-boot order=nc
> 	-no-reboot
> 	-watchdog i6300esb
> 	-rtc base=localtime
> 	-serial stdio
> 	-display none
> 	-monitor null 
> )
> 
> append=(
> 	hung_task_panic=1
> 	earlyprintk=ttyS0,115200
> 	debug
> 	apic=debug
> 	sysrq_always_enabled
> 	rcupdate.rcu_cpu_stall_timeout=100
> 	panic=10
> 	softlockup_panic=1
> 	nmi_watchdog=panic
> 	prompt_ramdisk=0
> 	console=ttyS0,115200
> 	console=tty0
> 	vga=normal
> 	root=/dev/ram0
> 	rw
> 	drbd.minor_count=8
> )
> 
> "${kvm[@]}" --append "${append[*]}"
> ----------------------------------------------------------------------------
> 
> Thanks,
> Fengguang
> 
> 
> 
> _______________________________________________
> LKP mailing list
> LKP@...ux.intel.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ