lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8735djvwbu.wl-maz@kernel.org>
Date:   Fri, 26 Aug 2022 16:49:41 +0100
From:   Marc Zyngier <maz@...nel.org>
To:     Paolo Bonzini <pbonzini@...hat.com>
Cc:     Peter Xu <peterx@...hat.com>, Gavin Shan <gshan@...hat.com>,
        kvmarm@...ts.cs.columbia.edu, linux-arm-kernel@...ts.infradead.org,
        kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-doc@...r.kernel.org, linux-kselftest@...r.kernel.org,
        corbet@....net, james.morse@....com, alexandru.elisei@....com,
        suzuki.poulose@....com, oliver.upton@...ux.dev,
        catalin.marinas@....com, will@...nel.org, shuah@...nel.org,
        seanjc@...gle.com, dmatlack@...gle.com, bgardon@...gle.com,
        ricarkol@...gle.com, zhenyzha@...hat.com, shan.gavin@...il.com
Subject: Re: [PATCH v1 1/5] KVM: arm64: Enable ring-based dirty memory tracking

On Fri, 26 Aug 2022 11:50:24 +0100,
Paolo Bonzini <pbonzini@...hat.com> wrote:
> 
> On 8/24/22 00:47, Marc Zyngier wrote:
> >> I definitely don't think I 100% understand all the ordering things since
> >> they're complicated.. but my understanding is that the reset procedure
> >> didn't need memory barrier (unlike pushing, where we have explicit wmb),
> >> because we assumed the userapp is not hostile so logically it should only
> >> modify the flags which is a 32bit field, assuming atomicity guaranteed.
> > Atomicity doesn't guarantee ordering, unfortunately. Take the
> > following example: CPU0 is changing a bunch of flags for GFNs A, B, C,
> > D that exist in the ring in that order, and CPU1 performs an ioctl to
> > reset the page state.
> > 
> > CPU0:
> >      write_flag(A, KVM_DIRTY_GFN_F_RESET)
> >      write_flag(B, KVM_DIRTY_GFN_F_RESET)
> >      write_flag(C, KVM_DIRTY_GFN_F_RESET)
> >      write_flag(D, KVM_DIRTY_GFN_F_RESET)
> >      [...]
> > 
> > CPU1:
> >     ioctl(KVM_RESET_DIRTY_RINGS)
> > 
> > Since CPU0 writes do not have any ordering, CPU1 can observe the
> > writes in a sequence that have nothing to do with program order, and
> > could for example observe that GFN A and D have been reset, but not B
> > and C. This in turn breaks the logic in the reset code (B, C, and D
> > don't get reset), despite userspace having followed the spec to the
> > letter. If each was a store-release (which is the case on x86), it
> > wouldn't be a problem, but nothing calls it in the documentation.
> > 
> > Maybe that's not a big deal if it is expected that each CPU will issue
> > a KVM_RESET_DIRTY_RINGS itself, ensuring that it observe its own
> > writes. But expecting this to work across CPUs without any barrier is
> > wishful thinking.
> 
> Agreed, but that's a problem for userspace to solve.  If userspace
> wants to reset the fields in different CPUs, it has to synchronize
> with its own invoking of the ioctl.

userspace has no choice. It cannot order on its own the reads that the
kernel will do to *other* rings.

> That is, CPU0 must ensure that a ioctl(KVM_RESET_DIRTY_RINGS) is done
> after (in the memory-ordering sense) its last write_flag(D,
> KVM_DIRTY_GFN_F_RESET).  If there's no such ordering, there's no
> guarantee that the write_flag will have any effect.

The problem isn't on CPU0 The problem is that CPU1 does observe
inconsistent data on arm64, and I don't think this difference in
behaviour is acceptable. Nothing documents this, and there is a baked
in assumption that there is a strong ordering between writes as well
as between writes and read.

> The main reason why I preferred a global KVM_RESET_DIRTY_RINGS ioctl
> was because it takes kvm->slots_lock so the execution would be
> serialized anyway.  Turning slots_lock into an rwsem would be even
> worse because it also takes kvm->mmu_lock (since slots_lock is a
> mutex, at least two concurrent invocations won't clash with each other
> on the mmu_lock).

Whatever the reason, the behaviour should be identical on all
architectures. As is is, it only really works on x86, and I contend
this is a bug that needs fixing.

Thankfully, this can be done at zero cost for x86, and at that of a
set of load-acquires on other architectures.

	M.

-- 
Without deviation from the norm, progress is not possible.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ