lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 28 Jun 2018 09:45:56 +0100
From:   James Morse <james.morse@....com>
To:     Wei Xu <xuwei5@...ilicon.com>
Cc:     Will Deacon <will.deacon@....com>, mark.rutland@....com,
        catalin.marinas@....com, Linuxarm <linuxarm@...wei.com>,
        Zhangyi ac <zhangyi.ac@...wei.com>, suzuki.poulose@....com,
        marc.zyngier@....com,
        "Xiongfanggou (James)" <james.xiong@...wei.com>,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        dave.martin@....com,
        "Liyuan (Larry, Turing Solution)" <Larry.T@...wei.com>,
        libeijian@...ilicon.com
Subject: Re: KVM guest sometimes failed to boot because of kernel stack
 overflow if KPTI is enabled on a hisilicon ARM64 platform.

Hi Wei,

On 27/06/18 14:26, Wei Xu wrote:
> Sorry, I should highlight that I have only updated the default value
> of CONFIG_NR_CPUS by menuconfig in the previous mail.
> That is why it showed dirty.

(menuconfig changes don't show up like this)


More than 64 CPUs ... Is this system running more VMs than it has VMIDs? Too-few
VMIDs does work with KVM, its just going to trigger rollover frequently.

Just to check, what kernel version is the host running? Does it have commit
f0cf47d939d0 ("KVM: arm/arm64: Close VMID generation race")
(looks like that went in as a fix for v4.17-rc3)

Are you running (lots) of other VMs whenever this happens? Do they have multiple
vcpus? (I'm thinking of the scenario in that patch's description)

Is the host system otherwise idle when this happens?
(If not, can you reproduce the issue without exhausting the VMIDs?)


It may be that writing back the page-table entries with the MMU off, and
changing the cache maintenance are just changing the timing of something else.


Thanks,

James

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ