linux-kernel - Re: FSGSBASE causing panic on 5.9-rc1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7dedb0ab-56a6-5d96-577b-21ab1ecdad24@amd.com>
Date:   Wed, 19 Aug 2020 13:19:49 -0500
From:   Tom Lendacky <thomas.lendacky@....com>
To:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        X86 ML <x86@...nel.org>
Cc:     Andy Lutomirski <luto@...nel.org>,
        "Chang S. Bae" <chang.seok.bae@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Sasha Levin <sashal@...nel.org>,
        Borislav Petkov <bp@...en8.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>
Subject: Re: FSGSBASE causing panic on 5.9-rc1

On 8/19/20 1:07 PM, Tom Lendacky wrote:
> It looks like the FSGSBASE support is crashing my second generation EPYC
> system. I was able to bisect it to:
> 
> b745cfba44c1 ("x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit")
> 
> The panic only happens when using KVM. Doing kernel builds or stress
> on bare-metal appears fine. But if I fire up, in this case, a 64-vCPU
> guest and do a kernel build within the guest, I get the following:

I should clarify that this panic is on the bare-metal system, not in the
guest. And that specifying nofsgsbase on the bare-metal command line fixes
the issue.

Thanks,
Tom

> 
> [  120.360637] BUG: scheduling while atomic: qemu-system-x86/5485/0x00110000
> [  124.041646] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: x86_pmu_handle_irq+0x163/0x170
> [  124.041647] ------------[ cut here ]------------
> [  124.041649] Hardware name: AMD
> [  124.041649] Workqueue:  0x0 (events)
> [  124.041651] Call Trace:
> [  124.041651] ------------[ cut here ]------------
> [  124.041652] corrupted preempt_count: kworker/22:1/1449/0x110000
> [  124.051267] WARNING: CPU: 22 PID: 1449 at kernel/sched/core.c:3595 finish_task_switch+0x289/0x290
> [  124.051268] Modules linked in: tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc fuse amd64_edac_mod edac_mce_amd wmi_bmof kvm_amd kvm irqbypass sg ipmi_ssif ccp k10temp acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_cpufreq squashfs loop sch_fq_codel parport_pc ppdev lp parport ip_tables raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 linear sd_mod t10_pi crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper ast drm_vram_helper drm_ttm_helper i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect ahci sysimgblt libahci fb_sys_fops libata drm e1000e i2c_piix4 wmi i2c_designware_platform i2c_designware_core pinctrl_amd i2c_core
> [  124.051285] CPU: 22 PID: 1449 Comm: kworker/22:1 Tainted: G        W         5.9.0-rc1-sos-linux #1
> [  124.051286] Hardware name: AMD
> [  124.051286] Workqueue:  0x0 (events)
> [  124.051287] RIP: 0010:finish_task_switch+0x289/0x290
> [  124.051288] Code: ff 65 48 8b 04 25 c0 7b 01 00 8b 90 a8 08 00 00 48 8d b0 b0 0a 00 00 48 c7 c7 20 10 10 86 c6 05 be aa 55 01 01 e8 89 03 fd ff <0f> 0b e9 6b ff ff ff 55 48 89 e5 41 55 41 54 49 89 fc 53 48 89 f3
> [  124.051288] RSP: 0018:ffffc9001afe7e10 EFLAGS: 00010082
> [  124.051289] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000023
> [  124.051290] RDX: 0000000000000023 RSI: ffffffff86101044 RDI: ffff88900d798bb0
> [  124.051290] RBP: ffffc9001afe7e38 R08: ffff88900d798ba8 R09: 0000000000000005
> [  124.051290] R10: 000000000000000f R11: ffff88900d798d54 R12: ffff88900d7aacc0
> [  124.051291] R13: ffff889bd2308000 R14: 0000000000000000 R15: ffff88900d7aacc0
> [  124.051291] FS:  0000000000000000(0000) GS:ffff88900d780000(0000) knlGS:0000000000000000
> [  124.051292] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  124.051292] CR2: 00007ff607620000 CR3: 0000001bcb0d2000 CR4: 0000000000350ee0
> [  124.051293] Call Trace:
> [  124.051293]  __schedule+0x348/0x810
> [  124.051293]  ? dbs_work_handler+0x47/0x60
> [  124.051294]  schedule+0x4a/0xb0
> [  124.051294]  worker_thread+0xcf/0x3b0
> [  124.051294]  ? process_one_work+0x370/0x370
> [  124.051294]  kthread+0xfe/0x140
> [  124.051295]  ? kthread_park+0x90/0x90
> [  124.051295]  ret_from_fork+0x22/0x30
> [  124.051295] ---[ end trace 7f77ee8ad05caa89 ]---
> [  124.051296] Kernel Offset: disabled
> 
> Specifying nofsgsbase avoids the issue. This is very reproducible, so I
> can easily test any fixes.
> 
> Thanks,
> Tom
>