linux-kernel - Re: [PATCH v3 3/3] KVM: x86/mmu: Defer allocation of shadow MMU's hashed page list

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CADrL8HXSde+EeLD2UsDNkxAxdmKomEA1XqdDuF4iFaksiZUHLw@mail.gmail.com>
Date: Mon, 19 May 2025 11:29:47 -0400
From: James Houghton <jthoughton@...gle.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Vipin Sharma <vipinsh@...gle.com>
Subject: Re: [PATCH v3 3/3] KVM: x86/mmu: Defer allocation of shadow MMU's
 hashed page list

On Mon, May 19, 2025 at 9:37 AM Sean Christopherson <seanjc@...gle.com> wrote:
>
> On Sat, May 17, 2025, Paolo Bonzini wrote:
> > On 5/16/25 23:54, Sean Christopherson wrote:
> > > +   /*
> > > +    * Write mmu_page_hash exactly once as there may be concurrent readers,
> > > +    * e.g. to check for shadowed PTEs in mmu_try_to_unsync_pages().  Note,
> > > +    * mmu_lock must be held for write to add (or remove) shadow pages, and
> > > +    * so readers are guaranteed to see an empty list for their current
> > > +    * mmu_lock critical section.
> > > +    */
> > > +   WRITE_ONCE(kvm->arch.mmu_page_hash, h);
> >
> > Use smp_store_release here (unlike READ_ONCE(), it's technically incorrect
> > to use WRITE_ONCE() here!),
>
> Can you elaborate why?  Due to my x86-centric life, my memory ordering knowledge
> is woefully inadequate.

The compiler must be prohibited from reordering stores preceding this
WRITE_ONCE() to after it.

In reality, the only stores that matter will be from within
kvcalloc(), and the important bits of it will not be inlined, so it's
unlikely that the compiler would actually do such reordering. But it's
nonetheless allowed. :) barrier() is precisely what is needed to
prohibit this; smp_store_release() on x86 is merely barrier() +
WRITE_ONCE().

Semantically, smp_store_release() is what you mean to write, as Paolo
said. We're not really *only* preventing torn accesses, we also need
to ensure that any threads that read kvm->arch.mmu_page_hash can
actually use that result (i.e., that all the stores from the
kvcalloc() are visible). This sounds a little bit weird for x86 code,
but compiler reordering is still a possibility.

And I also agree with what Paolo said about smp_load_acquire(). :)

Thanks Paolo. Please correct me if I'm wrong above.

> > with a remark that it pairs with kvm_get_mmu_page_hash().  That's both more
> > accurate and leads to a better comment than "write exactly once".
>