lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231002200426.GA4127272@dev-arch.thelio-3990X>
Date:   Mon, 2 Oct 2023 13:04:26 -0700
From:   Nathan Chancellor <nathan@...nel.org>
To:     Adam Dunlap <acdunlap@...gle.com>
Cc:     linux-kernel@...r.kernel.org, x86@...nel.org,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        Andy Lutomirski <luto@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Sean Christopherson <seanjc@...gle.com>,
        Kim Phillips <kim.phillips@....com>,
        Juergen Gross <jgross@...e.com>,
        Ashok Raj <ashok.raj@...el.com>,
        Joerg Roedel <jroedel@...e.de>,
        Tom Lendacky <thomas.lendacky@....com>,
        David Hildenbrand <david@...hat.com>,
        Mike Rapoport <rppt@...nel.org>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Nikunj A Dadhania <nikunj@....com>,
        Dionna Glaze <dionnaglaze@...gle.com>,
        Peter Gonda <pgonda@...gle.com>,
        David Rientjes <rientjes@...gle.com>,
        Khalid ElMously <khalid.elmously@...onical.com>,
        Jacob Xu <jacobhxu@...gle.com>, llvm@...ts.linux.dev
Subject: Re: [PATCH v2 2/2] x86/sev-es: Only set x86_virt_bits to correct
 value

Hi Adam,

On Mon, Sep 11, 2023 at 05:27:03PM -0700, Adam Dunlap wrote:
> Instead of setting x86_virt_bits to a possibly-correct value and then
> correcting it later, do all the necessary checks before setting it.
> 
> At this point, the #VC handler references boot_cpu_data.x86_virt_bits,
> and in the previous version, it would be triggered by the cpuids between
> the point at which it is set to 48 and when it is set to the correct
> value.
> 
> Suggested-by: Dave Hansen <dave.hansen@...ux.intel.com>
> Signed-off-by: Adam Dunlap <acdunlap@...gle.com>

Our continuous integration started seeing panics when booting ARCH=i386
without KVM after this change landed in -tip as commit fbf6449f84bf
("x86/sev-es: Set x86_virt_bits to the correct value straight away,
instead of a two-phase approach"):

https://github.com/ClangBuiltLinux/continuous-integration2/actions/runs/6374441341/job/17299441736

I can reproduce this with GCC 13.2.0 from kernel.org and QEMU 8.1.1 (in
case those versions matter):

https://mirrors.edge.kernel.org/pub/tools/crosstool/

$ make -skj"$(nproc)" ARCH=i386 CROSS_COMPILE=i386-linux- defconfig bzImage

$ qemu-system-i386 \
    -display none \
    -nodefaults \
    -append 'console=ttyS0 earlycon=uart8250,io,0x3f8' \
    -kernel arch/x86/boot/bzImage \
    -initrd rootfs.cpio \
    -m 512m \
    -serial mon:stdio
[    0.000000] Linux version 6.6.0-rc1-00008-gfbf6449f84bf (nathan@...-arch.thelio-3990X) (i386-linux-gcc (GCC) 13.2.0, GNU ld (GNU Binutils) 2.41) #1 SMP PREEMPT_DYNAMIC Mon Oct  2 12:55:06 MST 2023
...
[    0.061831] BUG: kernel NULL pointer dereference, address: 00000020
[    0.062135] #PF: supervisor write access in kernel mode
[    0.062279] #PF: error_code(0x0002) - not-present page
[    0.062430] *pde = 00000000
[    0.062602] Oops: 0002 [#1] PREEMPT SMP
[    0.062839] CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0-rc1-00008-gfbf6449f84bf #1
[    0.063086] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[    0.063515] EIP: __ring_buffer_alloc+0x3a/0x1a4
[    0.064135] Code: 8b 15 74 da c2 ce 8d 42 6f f7 da 21 d0 ba c0 0d 00 00 e8 2d d0 07 00 85 c0 0f 84 15 01 00 00 8b 4d f0 89 c3 81 c6 f3 0f 00 00 <c7> 40 10 00 00 00 00 8d 40 4c ba 2d a7 8b ce 89 78 b4 c7 40 ec 60
[    0.064795] EAX: 00000010 EBX: 00000010 ECX: ced82e8d EDX: 00000dc0
[    0.064970] ESI: 00001ff3 EDI: 00000001 EBP: cea3df54 ESP: cea3df44
[    0.065151] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00200006
[    0.065347] CR0: 80050033 CR2: 00000020 CR3: 0ed2d000 CR4: 00000090
[    0.065596] Call Trace:
[    0.066142]  ? show_regs+0x4d/0x54
[    0.066293]  ? __die+0x21/0x68
[    0.066383]  ? page_fault_oops+0x18c/0x368
[    0.066502]  ? kernelmode_fixup_or_oops.constprop.0+0x84/0xdc
[    0.066663]  ? __bad_area_nosemaphore.constprop.0+0x125/0x1dc
[    0.066811]  ? __alloc_pages+0x156/0xe34
[    0.066920]  ? bad_area_nosemaphore+0x12/0x18
[    0.067036]  ? do_user_addr_fault+0x133/0x3e8
[    0.067156]  ? exc_page_fault+0x51/0x13c
[    0.067269]  ? pvclock_clocksource_read_nowd+0x110/0x110
[    0.067421]  ? handle_exception+0x133/0x133
[    0.067553]  ? pvclock_clocksource_read_nowd+0x110/0x110
[    0.067724]  ? __ring_buffer_alloc+0x3a/0x1a4
[    0.067846]  ? pvclock_clocksource_read_nowd+0x110/0x110
[    0.067983]  ? __ring_buffer_alloc+0x3a/0x1a4
[    0.068113]  early_trace_init+0xab/0x390
[    0.068296]  ? ring_buffer_reset_online_cpus+0xfc/0xfc
[    0.068443]  start_kernel+0x217/0x650
[    0.068542]  ? set_init_arg+0x70/0x70
[    0.068643]  i386_start_kernel+0x43/0x44
[    0.068754]  startup_32_smp+0x156/0x158
[    0.068960] Modules linked in:
[    0.069166] CR2: 0000000000000020
[    0.069495] ---[ end trace 0000000000000000 ]---
[    0.069653] EIP: __ring_buffer_alloc+0x3a/0x1a4
[    0.069781] Code: 8b 15 74 da c2 ce 8d 42 6f f7 da 21 d0 ba c0 0d 00 00 e8 2d d0 07 00 85 c0 0f 84 15 01 00 00 8b 4d f0 89 c3 81 c6 f3 0f 00 00 <c7> 40 10 00 00 00 00 8d 40 4c ba 2d a7 8b ce 89 78 b4 c7 40 ec 60
[    0.070249] EAX: 00000010 EBX: 00000010 ECX: ced82e8d EDX: 00000dc0
[    0.070412] ESI: 00001ff3 EDI: 00000001 EBP: cea3df54 ESP: cea3df44
[    0.070577] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00200006
[    0.070761] CR0: 80050033 CR2: 00000020 CR3: 0ed2d000 CR4: 00000090
[    0.071023] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.071643] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
...

In case it is necessary, our rootfs image is available at
https://github.com/ClangBuiltLinux/boot-utils/releases.

As mentioned before, this interestingly does not reproduce with
'-enable-kvm', so it could be potentially related to QEMU's TCG. If
there is any additional information I can provide or patches I can test,
I am more than happy to do so.

Cheers,
Nathan

# bad: [18a1f37f57ff28fa544124151ef0171ffdafd162] Merge branch into tip/master: 'x86/tdx'
# good: [9f3ebbef746f89f860a90ced99a359202ea86fde] Merge tag '6.6-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd
git bisect start '18a1f37f57ff28fa544124151ef0171ffdafd162' '9f3ebbef746f89f860a90ced99a359202ea86fde'
# good: [6395aa368403d595c8aab4ee1ac451b84fb37640] Merge branch into tip/master: 'sched/core'
git bisect good 6395aa368403d595c8aab4ee1ac451b84fb37640
# good: [7eddb9db1104079c5f5ac6b17a46f7a7860cf444] Merge branch into tip/master: 'x86/boot'
git bisect good 7eddb9db1104079c5f5ac6b17a46f7a7860cf444
# good: [3fb58599f5a9efaa422c9c9e78e3e799a58d4f63] Merge branch into tip/master: 'x86/entry'
git bisect good 3fb58599f5a9efaa422c9c9e78e3e799a58d4f63
# bad: [062ce2831f0526dec8930a35f1624c0ae50dd667] Merge branch into tip/master: 'x86/platform'
git bisect bad 062ce2831f0526dec8930a35f1624c0ae50dd667
# good: [f4c5ca9850124fb5715eff06cffb1beed837500c] x86_64: Show CR4.PSE on auxiliaries like on BSP
git bisect good f4c5ca9850124fb5715eff06cffb1beed837500c
# good: [24775700eaa93ff83b2a0f1e005879cdf186cdd9] x86/amd_nb: Add AMD Family MI300 PCI IDs
git bisect good 24775700eaa93ff83b2a0f1e005879cdf186cdd9
# bad: [fbf6449f84bf5e4ad09f2c09ee70ed7d629b5ff6] x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach
git bisect bad fbf6449f84bf5e4ad09f2c09ee70ed7d629b5ff6
# good: [f79936545fb122856bd78b189d3c7ee59928c751] x86/sev-es: Allow copy_from_kernel_nofault() in earlier boot
git bisect good f79936545fb122856bd78b189d3c7ee59928c751
# first bad commit: [fbf6449f84bf5e4ad09f2c09ee70ed7d629b5ff6] x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ