lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d94e549d-04de-5b23-c4e0-6c161ec8213e@arm.com>
Date:   Thu, 7 Nov 2019 13:51:47 +0000
From:   Valentin Schneider <valentin.schneider@....com>
To:     kernel test robot <lkp@...el.com>,
        Peter Zijlstra <peterz@...radead.org>
Cc:     Aaron Lu <aaron.lwe@...il.com>, Phil Auld <pauld@...hat.com>,
        Julien Desfossez <jdesfossez@...italocean.com>,
        Nishanth Aravamudan <naravamudan@...italocean.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>, lkp@...org
Subject: Re: [sched] 10e7071b2f: BUG:kernel_NULL_pointer_dereference,address



On 07/11/2019 09:08, kernel test robot wrote:
> FYI, we noticed the following commit (built with gcc-7):
> 
> commit: 10e7071b2f491b0fb981717ea0a585c441906ede ("sched: Rework CPU hotplug task selection")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> 
> in testcase: kernel_selftests
> with following parameters:
> 
> 	group: kselftests-01
> 
> test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.
> test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
> 
> 
> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 8G
> 
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> 
> 
> +-------------------------------------------------+------------+------------+
> |                                                 | f95d4eaee6 | 10e7071b2f |
> +-------------------------------------------------+------------+------------+
> | boot_successes                                  | 54         | 12         |
> | boot_failures                                   | 0          | 82         |
> | BUG:kernel_NULL_pointer_dereference,address     | 0          | 79         |
> | Oops:#[##]                                      | 0          | 79         |
> | RIP:pick_next_task_dl                           | 0          | 79         |
> | Kernel_panic-not_syncing:Fatal_exception        | 0          | 79         |
> | BUG:kernel_reboot-without-warning_in_test_stage | 0          | 3          |
> +-------------------------------------------------+------------+------------+
> 
> 
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <lkp@...el.com>
> 
> 

FWIW the decoded stacktrace from my end is (my x86 GCC is a tad ancient,
but the lines seem to match):

[   84.432464] BUG: kernel NULL pointer dereference, address: 0000000000000064
[   84.433700] #PF: supervisor read access in kernel mode
[   84.434589] #PF: error_code(0x0000) - not-present page
[   84.435499] PGD 0 P4D 0
[   84.435933] Oops: 0000 [#1] SMP PTI
[   84.436581] CPU: 1 PID: 15 Comm: migration/1 Not tainted 5.3.0-rc1-00086-g10e7071b2f491 #1
[   84.438004] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[   84.439461] RIP: 0010:pick_next_task_dl (./include/linux/sched/deadline.h:13 ./include/linux/sched/deadline.h:20 kernel/sched/deadline.c:505 kernel/sched/deadline.c:1766)
[ 84.440266] Code: ed bd 70 01 01 e8 42 2d fb ff 0f 0b e9 6b ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 53 48 89 fb 48 83 ec 10 <8b> 46 64 85 c0 78 73 48 81 7e 78 a0 3f e2 a7 74 57 48 83 bb 10 09
All code
========
   0:	ed			in     (%dx),%eax
   1:	bd 70 01 01 e8		mov    $0xe8010170,%ebp
   6:	42 2d fb ff 0f 0b	rex.X sub $0xb0ffffb,%eax
   c:	e9 6b ff ff ff		jmpq   0xffffffffffffff7c
  11:	66 66 2e 0f 1f 84 00	data16 nopw %cs:0x0(%rax,%rax,1)
  18:	00 00 00 00
  1c:	66 66 66 66 90		data16 data16 data16 xchg %ax,%ax
  21:	55			push   %rbp
  22:	53			push   %rbx
  23:	48 89 fb		mov    %rdi,%rbx
  26:	48 83 ec 10		sub    $0x10,%rsp
  2a:*	8b 46 64		mov    0x64(%rsi),%eax		<-- trapping instruction
  2d:	85 c0			test   %eax,%eax
  2f:	78 73			js     0xa4
  31:	48 81 7e 78 a0 3f e2	cmpq   $0xffffffffa7e23fa0,0x78(%rsi)
  38:	a7
  39:	74 57			je     0x92
  3b:	48			rex.W
  3c:	83			.byte 0x83
  3d:	bb			.byte 0xbb
  3e:	10 09			adc    %cl,(%rcx)

Code starting with the faulting instruction
===========================================
   0:	8b 46 64		mov    0x64(%rsi),%eax
   3:	85 c0			test   %eax,%eax
   5:	78 73			js     0x7a
   7:	48 81 7e 78 a0 3f e2	cmpq   $0xffffffffa7e23fa0,0x78(%rsi)
   e:	a7
   f:	74 57			je     0x68
  11:	48			rex.W
  12:	83			.byte 0x83
  13:	bb			.byte 0xbb
  14:	10 09			adc    %cl,(%rcx)
[   84.443485] RSP: 0000:ffffa5518008bd40 EFLAGS: 00010082
[   84.444423] RAX: ffffffffa6eeeae0 RBX: ffff98ebbfd2b0c0 RCX: ffff98ebbfd2d040
[   84.445641] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff98ebbfd2b0c0
[   84.446877] RBP: ffffa5518008bdc0 R08: 0000001ac1016512 R09: 0000000000000001
[   84.448128] R10: ffffffffa863e640 R11: 0000000000000003 R12: ffff98ebbfd2b0c0
[   84.449349] R13: ffffffffa7e23fa0 R14: ffffffffa7e24060 R15: 0000000000000000
[   84.450603] FS:  0000000000000000(0000) GS:ffff98ebbfd00000(0000) knlGS:0000000000000000
[   84.452007] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   84.453022] CR2: 0000000000000064 CR3: 00000001aab84000 CR4: 00000000000406e0
[   84.454244] Call Trace:
[   84.455263] ? update_rq_clock (kernel/sched/pelt.h:85 kernel/sched/core.c:196 kernel/sched/core.c:218)
[   84.456081] sched_cpu_dying (kernel/sched/core.c:6093 kernel/sched/core.c:6143 kernel/sched/core.c:6366)
[   84.456777] ? sched_cpu_starting (kernel/sched/core.c:6353)
[   84.457510] cpuhp_invoke_callback (kernel/cpu.c:166 (discriminator 4))
[   84.458279] ? cpu_disable_common (./arch/x86/include/asm/bitops.h:80 ./include/asm-generic/bitops-instrumented.h:57 ./include/linux/cpumask.h:327 arch/x86/kernel/smpboot.c:1570 arch/x86/kernel/smpboot.c:1585)
[   84.459047] take_cpu_down (kernel/cpu.c:855)
[   84.459649] multi_cpu_stop (./arch/x86/include/asm/atomic.h:125 ./include/asm-generic/atomic-instrumented.h:748 kernel/stop_machine.c:176 kernel/stop_machine.c:227)
[   84.460339] ? stop_machine_yield (??:?)
[   84.461078] cpu_stopper_thread (kernel/stop_machine.c:519)
[   84.461809] ? smpboot_thread_fn (kernel/smpboot.c:113)
[   84.462539] ? smpboot_thread_fn (kernel/smpboot.c:128 (discriminator 3))
[   84.463280] ? smpboot_thread_fn (kernel/smpboot.c:129)
[   84.464024] smpboot_thread_fn (kernel/smpboot.c:129)
[   84.464768] ? sort_range (kernel/smpboot.c:108)
[   84.465389] kthread (kernel/kthread.c:237)
[   84.465961] ? kthread_park (kernel/kthread.c:609)
[   84.466605] ret_from_fork (arch/x86/entry/entry_64.S:358)
[   84.467236] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver binfmt_misc intel_rapl_msr intel_rapl_common sr_mod crct10dif_pclmul cdrom crc32_pclmul sg crc32c_intel ghash_clmulni_intel ata_generic pata_acpi ppdev bochs_drm drm_vram_helper ttm drm_kms_helper syscopyarea sysfillrect snd_pcm sysimgblt fb_sys_fops drm aesni_intel snd_timer ata_piix crypto_simd snd cryptd glue_helper libata soundcore pcspkr joydev serio_raw parport_pc i2c_piix4 parport floppy ip_tables
[   84.474471] CR2: 0000000000000064
[   84.475066] ---[ end trace af8f1919a81ca744 ]---


Using that, the fail is on:

	if (need_pull_dl_task(rq, prev)) {

Which is most likely explained by the above call ending up doing a 

  dl_prio(prev->prio);

which doesn't play well with 

  class->pick_next_task(rq, NULL, NULL);


Now, this is no longer an issue (I think) with the rest of Peter's series,
since the above deref is gone with

67692435c411 ("sched: Rework pick_next_task() slow-path")

It would be interesting to know whether LKP found this on a mainline kernel
and bisected it down, or if it stumbled on this while bisecting something
else.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ