linux-kernel - Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5B34B673.20803@hisilicon.com>
Date:   Thu, 28 Jun 2018 11:20:35 +0100
From:   Wei Xu <xuwei5@...ilicon.com>
To:     James Morse <james.morse@....com>
CC:     Will Deacon <will.deacon@....com>, <mark.rutland@....com>,
        <catalin.marinas@....com>, Linuxarm <linuxarm@...wei.com>,
        Zhangyi ac <zhangyi.ac@...wei.com>, <suzuki.poulose@....com>,
        <marc.zyngier@....com>,
        "Xiongfanggou (James)" <james.xiong@...wei.com>,
        <linux-arm-kernel@...ts.infradead.org>,
        <linux-kernel@...r.kernel.org>, <dave.martin@....com>,
        "Liyuan (Larry, Turing Solution)" <Larry.T@...wei.com>,
        <libeijian@...ilicon.com>
Subject: Re: KVM guest sometimes failed to boot because of kernel stack
 overflow if KPTI is enabled on a hisilicon ARM64 platform.

Hi James,

On 2018/6/28 9:45, James Morse wrote:
> Hi Wei,
> 
> On 27/06/18 14:26, Wei Xu wrote:
>> Sorry, I should highlight that I have only updated the default value
>> of CONFIG_NR_CPUS by menuconfig in the previous mail.
>> That is why it showed dirty.
> 
> (menuconfig changes don't show up like this)

Thanks!
Sorry, yes, you are right.
I did not see dirty after I reset the proc.S.

> 
> 
> More than 64 CPUs ... Is this system running more VMs than it has VMIDs? Too-few
> VMIDs does work with KVM, its just going to trigger rollover frequently.
>

No, we just ran one VM.

> Just to check, what kernel version is the host running? Does it have commit
> f0cf47d939d0 ("KVM: arm/arm64: Close VMID generation race")
> (looks like that went in as a fix for v4.17-rc3)

Yes, the host is runing 4.18-rc2 as the guest including above commit.

> 
> Are you running (lots) of other VMs whenever this happens? Do they have multiple
> vcpus? (I'm thinking of the scenario in that patch's description)

No, we just ran one VM with 1 cpu.

> 
> Is the host system otherwise idle when this happens?
> (If not, can you reproduce the issue without exhausting the VMIDs?)
> 
> 
> It may be that writing back the page-table entries with the MMU off, and
> changing the cache maintenance are just changing the timing of something else.
> 

Yes, maybe. Now we are debugging with the SoC guys together.
Thanks!

Best Regards,
Wei

> 
> Thanks,
> 
> James
> 
> .
>