linux-kernel - Re: [PATCH 19/22] KVM: x86/mmu: Add infrastructure to allow walking rmaps outside of mmu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CADrL8HV1Erpg-D4LzuRHUk7dg6mvex8oQz5pBzwO7A3OjB8Uvw@mail.gmail.com>
Date: Tue, 10 Sep 2024 14:11:15 -0700
From: James Houghton <jthoughton@...gle.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Oliver Upton <oliver.upton@...ux.dev>, Marc Zyngier <maz@...nel.org>, Peter Xu <peterx@...hat.com>
Subject: Re: [PATCH 19/22] KVM: x86/mmu: Add infrastructure to allow walking
 rmaps outside of mmu_lock

On Mon, Sep 9, 2024 at 6:42 PM Sean Christopherson <seanjc@...gle.com> wrote:
>
> On Mon, Sep 09, 2024, James Houghton wrote:
> > I take back what I said about this working on x86. I think it's
> > possible for there to be a race.
> >
> > Say...
> >
> > 1. T1 modifies pte_list_desc then unlocks kvm_rmap_unlock().
> > 2. T2 then locks kvm_rmap_lock_readonly().
> >
> > The modifications that T1 has made are not guaranteed to be visible to
> > T2 unless T1 has an smp_wmb() (or equivalent) after the modfication
> > and T2 has an smp_rmb() before reading the data.
> >
> > Now the way you had it, T2, because it uses try_cmpxchg() with full
> > ordering, will effectively do a smp_rmb(). But T1 only does an
> > smp_wmb() *after dropping the mmu_lock*, so there is a race. While T1
> > still holds the mmu_lock but after releasing the kvm_rmap_lock(), T2
> > may enter its critical section and then *later* observe the changes
> > that T1 made.
> >
> > Now this is impossible on x86 (IIUC) if, in the compiled list of
> > instructions, T1's writes occur in the same order that we have written
> > them in C. I'm not sure if WRITE_ONCE guarantees that this reordering
> > at compile time is forbidden.
> >
> > So what I'm saying is:
> >
> > 1. kvm_rmap_unlock() must have an smp_wmb().
>
> No, because beating a dead horse, this is not generic code, this is x86.

What prevents the compiler from reordering (non-atomic, non-volatile)
stores that happen before WRITE_ONCE() in kvm_rmap_unlock() to after
the WRITE_ONCE()?

IMV, such a reordering is currently permitted[1] (i.e., a barrier() is
missing), and should the compiler choose to do this, the lock will not
function correctly.

> If kvm_rmap_head.val were an int, i.e. could be unionized with an atomic_t, then
> I wouldn't be opposed to doing this in the locking code to document things:
>
>  s/READ_ONCE/atomic_read_acquire
>  s/WRITE_ONCE/atomic_set_release
>  s/try_cmpxchg/atomic_cmpxchg_acquire

I think we can use atomic_long_t.

It would be really great if we did a substitution like this. That
would address my above concern about barrier() (atomic_set_release,
for example, implies a barrier() that we otherwise need to include).

[1]: https://www.kernel.org/doc/Documentation/memory-barriers.txt
(GUARANTEES + COMPILER BARRIER)