[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1504665789.4482.31.camel@intel.com>
Date:   Tue, 05 Sep 2017 19:43:09 -0700
From:   Sai Praneeth Prakhya <sai.praneeth.prakhya@...el.com>
To:     Bhupesh Sharma <bhsharma@...hat.com>
Cc:     "linux-efi@...r.kernel.org" <linux-efi@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Matt Fleming <matt@...eblueprint.co.uk>,
        Ard Biesheuvel <ard.biesheuvel@...aro.org>,
        "jlee@...e.com" <jlee@...e.com>, Borislav Petkov <bp@...en8.de>,
        "Luck, Tony" <tony.luck@...el.com>,
        "luto@...nel.org" <luto@...nel.org>,
        "mst@...hat.com" <mst@...hat.com>,
        "Neri, Ricardo" <ricardo.neri@...el.com>,
        "Shankar, Ravi V" <ravi.v.shankar@...el.com>
Subject: Re: [PATCH V2 0/3] Use mm_struct and switch_mm() instead of manually
On Tue, 2017-09-05 at 19:21 -0700, Sai Praneeth Prakhya wrote:
> > I get a similar crash on Qemu with linus's master branch and the V2
> > applied on top of it. Here are the details of my test environment:
> > 
> > 1. I use the OVMF (EDK2) EFI firmware to launch the kernel:
> > edk2.git/ovmf-x64
> > 
> > 2. I used linus's master branch (HEAD - commit:
> > b1b6f83ac938d176742c85757960dec2cf10e468) and applied your v2 on top
> > of the same.
> > 
> > 3. I use the following qemu command line to launch the test:
> > 
> > # /usr/local/bin/qemu-system-x86_64 --version
> > QEMU emulator version 2.9.50 (v2.9.0-526-g76d20ea)
> > Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
> > 
> > # /usr/local/bin/qemu-system-x86_64 -enable-kvm  -net nic -net tap  -m
> > $MEMSIZE -nographic -drive file=$DISK_IMAGE,if=virtio,format=qcow2
> > -vga std -boot c -cpu host -kernel $KERNEL -append
> > "crashkernel=$CRASH_MEMSIZE console=ttyS0,115200n81"  -initrd
> > $INITRAMFS -bios $OVMF_FW_PATH
> > 
> > And here is the crash log:
> > 
> > [    0.006054] general protection fault: 0000 [#1] SMP
> > [    0.006459] Modules linked in:
> > [    0.006711] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0+ #3
> > [    0.007000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > BIOS 0.0.0 02/06/2015
> > [    0.007000] task: ffffffff81e0f480 task.stack: ffffffff81e00000
> > [    0.007000] RIP: 0010:switch_mm_irqs_off+0x1bc/0x440
> > [    0.007000] RSP: 0000:ffffffff81e03d80 EFLAGS: 00010086
> > [    0.007000] RAX: 800000007d084000 RBX: 0000000000000000 RCX: 000077ff80000000
> > [    0.007000] RDX: 000000007d084000 RSI: 8000000000000000 RDI: 0000000000019a00
> > [    0.007000] RBP: ffffffff81e03dc0 R08: 0000000000000000 R09: ffff88007d085000
> > [    0.007000] R10: ffffffff81e03dd8 R11: 000000007d095063 R12: ffffffff81e5c6a0
> > [    0.007000] R13: ffffffff81ed4f40 R14: 0000000000000030 R15: 0000000000000001
> > [    0.007000] FS:  0000000000000000(0000) GS:ffff88007d400000(0000)
> > knlGS:0000000000000000
> > [    0.007000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    0.007000] CR2: ffff88007d754000 CR3: 000000000220a000 CR4: 00000000000406b0
> > [    0.007000] Call Trace:
> > [    0.007000]  switch_mm+0xd/0x20
> > [    0.007000]  ? switch_mm+0xd/0x20
> > [    0.007000]  efi_switch_mm+0x3e/0x4a
> > [    0.007000]  efi_call_phys_prolog+0x28/0x1ac
> > [    0.007000]  efi_enter_virtual_mode+0x35a/0x48f
> > [    0.007000]  start_kernel+0x332/0x3b8
> > [    0.007000]  x86_64_start_reservations+0x2a/0x2c
> > [    0.007000]  x86_64_start_kernel+0x178/0x18b
> > [    0.007000]  secondary_startup_64+0xa5/0xa5
> > [    0.007000]  ? secondary_startup_64+0xa5/0xa5
> > [    0.007000] Code: 00 00 00 80 49 03 55 50 0f 82 7f 02 00 00 48 b9
> > 00 00 00 80 ff 77 00 00 48 be 00 00 00 00 00 00 00 80 48 01 ca 48 09
> > f0 48 09 d0 <0f> 22 d8 0f 1f 44 00 00 e9 47 ff ff ff 65 8b 05 b8 87 fb
> > 7e 89
> > [    0.007000] RIP: switch_mm_irqs_off+0x1bc/0x440 RSP: ffffffff81e03d80
> > [    0.007000] ---[ end trace bfa55bf4e4765255 ]---
> > [    0.007000] Kernel panic - not syncing: Attempted to kill the idle task!
> > [    0.007000] ---[ end Kernel panic - not syncing: Attempted to kill
> > the idle task!
> > 
> > 4. Note though that if I use the EFI_MIXED mode (i.e. 32-bit ovmf
> > firmware and 64-bit x86 kernel) with your patches, the primary kernel
> > boots fine on Qemu:
> > 
> > ovmf firmware used in this case - edk2.git/ovmf-ia32
> > 
> > 5. Also, if I append 'efi=old_map' to the bootargs (for the failing
> > case in point 3 above), I see the primary kernel boots fine on Qemu as
> > well.
> > 
> > Regards,
> > Bhupesh
> 
> Hi Bhupesh,
> 
> Thanks a lot for the detailed explanation. They are helpful to reproduce
> the issue quickly. From my initial debug, I think that AMD SME +
> efi_mm_struct patches + -cpu host (in qemu) are required to reproduce
> the issue on qemu.
> 
> I have tried the following combinations (all tests are on qemu):
> On Linus's tree:
> 1. With  SME and  efi_mm and  -cpu host -> panics
> 2. With  SME and  efi_mm and !-cpu host -> boots
> 3. With  SME and !efi_mm and  -cpu host -> boots
> 4. With  SME and !efi_mm and !-cpu host -> boots
> 5. With !SME and  efi_mm and  -cpu host -> boots
> 6. With !SME and  efi_mm and !-cpu host -> boots
> 7. With !SME and !efi_mm and  -cpu host -> boots
> 8. With !SME and !efi_mm and !-cpu host -> boots
> 
> On Matt's tree (no SME):
> 1. With  efi_mm and  -cpu host -> boots
> 2. With  efi_mm and !-cpu host -> boots
> 3. With !efi_mm and  -cpu host -> boots
> 4. With !efi_mm and !-cpu host -> boots
> 
> Summary:
> On Matt's tree (next branch), I am unable to reproduce the issue because
> they don't have SME patches.
> 
> On Linus's tree, with SME patches
> (b1b6f83ac938d176742c85757960dec2cf10e468) and my patches and -cpu host
> switch enabled in qemu, I was able to reproduce the issue.
> 
> Could you please confirm if you are seeing the same behavior?
> Specially on real machines (I think, this is equivalent to -cpu host on
> qemu) because in earlier mails you have mentioned that you were able to
> reproduce this on Matt's tree, but according to my theory it shouldn't
> be the case because Matt's three doesn't have SME patches.
> Did you back port (b1b6f83ac938d176742c85757960dec2cf10e468) this commit
> to Matt's tree and then applied my patches?
> 
> Your confirmation will help us in debugging the right issue.
> 
> Regards,
> Sai
Sorry! I am not sure if it's the SME patches or the PCID based TLB flush
patches (most likely the later because they change switch_mm() code).
Both the patches along with 5-level paging were in the same pull request
sent from Ingo to Linus. So, SME patches above really means this commit
id (b1b6f83ac938d176742c85757960dec2cf10e468) in Linus's tree. I will
debug this issue further and will send a V3 but to be sure that I am
debugging the right issue, Bhupesh, Could you please update me as
requested in earlier mail?
Regards,
Sai
Powered by blists - more mailing lists
 
