lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Tue, 9 Apr 2024 21:21:46 +0200
From: Marco Elver <elver@...gle.com>
To: Paul Menzel <pmenzel@...gen.mpg.de>
Cc: kasan-dev@...glegroups.com, Thomas Gleixner <tglx@...utronix.de>, 
	Borislav Petkov <bp@...en8.de>, Peter Zijlstra <peterz@...radead.org>, 
	Josh Poimboeuf <jpoimboe@...nel.org>, Ingo Molnar <mingo@...hat.com>, 
	Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org, 
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: BUG: unable to handle page fault for address: 0000000000030368

On Thu, 28 Mar 2024 at 17:17, Paul Menzel <pmenzel@...gen.mpg.de> wrote:
>
> Dear Marco, dear Linux folks,
>
>
> Am 26.03.24 um 13:44 schrieb Paul Menzel:
> > [Cc: +X86 maintainers]
>
> > Thank you for your quick reply. (Note, that your mailer wrapped the
> > pasted lines.)
> >
> > Am 26.03.24 um 11:07 schrieb Marco Elver:
> >> On Tue, 26 Mar 2024 at 10:23, Paul Menzel wrote:
> >
> >>> Trying KCSAN the first time – configuration attached –, it fails to boot
> >>> on the Dell XPS 13 9360 and QEMU q35. I couldn’t get logs on the Dell
> >>> XPS 13 9360, so here are the QEMU ones:
> >>
> >> If there's a bad access somewhere which is instrumented by KCSAN, it
> >> will unfortunately still crash inside KCSAN.
> >>
> >> What happens if you compile with CONFIG_KCSAN_EARLY_ENABLE=n? It
> >> disables KCSAN (but otherwise the kernel image is the same) and
> >> requires turning it on manually with "echo on >
> >> /sys/kernel/debug/kcsan" after boot.
> >>
> >> If it still crashes, then there's definitely a bug elsewhere. If it
> >> doesn't crash, and only crashes with KCSAN enabled, my guess is that
> >> KCSAN's delays of individual threads are perturbing execution to
> >> trigger previously undetected bugs.
> >
> > Such a Linux kernel booted with a warning on the Dell XPS 13 9360 (but
> > booted with *no* warning on QEMU q35) [1], but enabling KCSAN on the
> > laptop hangs the laptop right away. I couldn’t get any logs of the laptop.
>
> In the QEMU q35 virtual machine `echo on | sudo tee
> /sys/kernel/debug/kcsan` also locks up the system. Please find the logs
> attached.
>
>      [   78.241245] BUG: unable to handle page fault for address:
> 0000000000019a18
>      [   78.242815] #PF: supervisor read access in kernel mode
>      [   78.244001] #PF: error_code(0x0000) - not-present page
>      [   78.245186] PGD 0 P4D 0
>      [   78.245828] Oops: 0000 [#1] PREEMPT SMP NOPTI
>      [   78.246878] CPU: 4 PID: 783 Comm: sudo Not tainted 6.9.0-rc1+ #83
>      [   78.248289] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> BIOS rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014
>      [   78.250763] RIP: 0010:kcsan_setup_watchpoint+0x2b3/0x400
>      [   78.252108] Code: ea 00 f0 48 ff 05 25 b4 8f 02 eb e0 65 48 8b
> 05 7b 53 23 4f 48 8d 98 c0 02 03 00 e9 9f fd ff ff 48 83 fd 08 0f 85 fd
> 00 00 00 <4d> 8b 04 24 e9 bf fe ff ff 49 85 d1 75 54 ba 01 00 00 00 4a 84
>      [   78.256284] RSP: 0018:ffffbae1c0f5bc48 EFLAGS: 00010046
>      [   78.257548] RAX: 0000000000000000 RBX: ffff9b95c4ba93b0 RCX:
> 0000000000000019
>      [   78.259158] RDX: 0000000000000001 RSI: ffffffffb0f82d36 RDI:
> 0000000000000000
>      [   78.260781] RBP: 0000000000000008 R08: 00000000aaaaaaab R09:
> 0000000000000000
>      [   78.262417] R10: 0000000000000086 R11: 0010000000019a18 R12:
> 0000000000019a18
>      [   78.264040] R13: 000000000000001a R14: 0000000000000000 R15:
> 0000000000000000
>      [   78.265658] FS:  00007f65e3a91f00(0000)
> GS:ffff9b9d1f000000(0000) knlGS:0000000000000000
>      [   78.267480] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>      [   78.268804] CR2: 0000000000019a18 CR3: 0000000102e26000 CR4:
> 00000000003506f0
>      [   78.270424] Call Trace:
>      [   78.271036]  <TASK>
>      [   78.271572]  ? __die+0x23/0x70
>      [   78.272344]  ? page_fault_oops+0x173/0x4f0
>      [   78.273400]  ? exc_page_fault+0x81/0x190
>      [   78.274373]  ? asm_exc_page_fault+0x26/0x30
>      [   78.275395]  ? refill_obj_stock+0x36/0x2e0
>      [   78.276410]  ? kcsan_setup_watchpoint+0x2b3/0x400
>      [   78.277556]  refill_obj_stock+0x36/0x2e0
>      [   78.278540]  obj_cgroup_uncharge+0x13/0x20
>      [   78.279596]  __memcg_slab_free_hook+0xac/0x140
>      [   78.280661]  ? free_pipe_info+0x135/0x150
>      [   78.281631]  kfree+0x2de/0x310
>      [   78.282419]  free_pipe_info+0x135/0x150
>      [   78.283395]  pipe_release+0x188/0x1a0
>      [   78.284303]  __fput+0x127/0x4e0
>      [   78.285114]  __fput_sync+0x35/0x40
>      [   78.285958]  __x64_sys_close+0x54/0xa0
>      [   78.286914]  do_syscall_64+0x88/0x1a0
>      [   78.287810]  ? fpregs_assert_state_consistent+0x7e/0x90
>      [   78.289185]  ? srso_return_thunk+0x5/0x5f
>      [   78.290203]  ? arch_exit_to_user_mode_prepare.isra.0+0x69/0xa0
>      [   78.291568]  ? srso_return_thunk+0x5/0x5f
>      [   78.292518]  ? syscall_exit_to_user_mode+0x40/0xe0
>      [   78.293651]  ? srso_return_thunk+0x5/0x5f
>      [   78.294606]  ? do_syscall_64+0x94/0x1a0
>      [   78.295516]  ? arch_exit_to_user_mode_prepare.isra.0+0x69/0xa0
>      [   78.296876]  ? srso_return_thunk+0x5/0x5f
>
> Can you reproduce this?

This seems to be a compiler issue with a new feature introduced in
6.9-rc1, and it's fixed in 6.9-rc2. It was fixed by: b6540de9b5c8
x86/percpu: Disable named address spaces for KCSAN

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ