linux-kernel - Re: [PATCH 5/9] KVM: x86/mmu: Convert "runtime" WARN_ON() assertions to WARN_ON

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZF7IRQZo8g7Lg46V@google.com>
Date:   Fri, 12 May 2023 16:14:13 -0700
From:   David Matlack <dmatlack@...gle.com>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org, Mingwei Zhang <mizhang@...gle.com>,
        Jim Mattson <jmattson@...gle.com>
Subject: Re: [PATCH 5/9] KVM: x86/mmu: Convert "runtime" WARN_ON() assertions
 to WARN_ON_ONCE()

On Thu, May 11, 2023 at 04:59:13PM -0700, Sean Christopherson wrote:
> Convert all "runtime" assertions, i.e. assertions that can be triggered
> while running vCPUs, from WARN_ON() to WARN_ON_ONCE().  Every WARN in the
> MMU that is tied to running vCPUs, i.e. not contained to loading and
> initializing KVM, is likely to fire _a lot_ when it does trigger.  E.g. if
> KVM ends up with a bug that causes a root to be invalidated before the
> page fault handler is invoked, pretty much _every_ page fault VM-Exit
> triggers the WARN.
> 
> If a WARN is triggered frequently, the resulting spam usually causes a lot
> of damage of its own, e.g. consumes resources to log the WARN and pollutes
> the kernel log, often to the point where other useful information can be
> lost.  In many case, the damage caused by the spam is actually worse than
> the bug itself, e.g. KVM can almost always recover from an unexpectedly
> invalid root.
> 
> On the flip side, warning every time is rarely helpful for debug and
> triage, i.e. a single splat is usually sufficient to point a debugger in
> the right direction, and automated testing, e.g. syzkaller, typically runs
> with warn_on_panic=1, i.e. will never get past the first WARN anyways.

On the topic of syzkaller, we should get them to test with
CONFIG_KVM_PROVE_MMU once it's available.

> 
> Lastly, when an assertions fails multiple times, the stack traces in KVM
> are almost always identical, i.e. the full splat only needs to be captured
> once.  And _if_ there is value in captruing information about the failed
> assert, a ratelimited printk() is sufficient and less likely to rack up a
> large amount of collateral damage.

These are all good arguments and I think they apply to KVM_MMU_WARN_ON()
as well. Should we convert that to _ONCE() too?