linux-kernel - Re: [PATCH] KVM: selftests: Wait mprotect_ro_done before write to RO in mmu_stress

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z6xGwnFR9cFg/TOL@yzhao56-desk.sh.intel.com>
Date: Wed, 12 Feb 2025 14:59:14 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
CC: "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "pbonzini@...hat.com"
	<pbonzini@...hat.com>, "seanjc@...gle.com" <seanjc@...gle.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] KVM: selftests: Wait mprotect_ro_done before write to RO
 in mmu_stress_test

On Wed, Feb 12, 2025 at 02:29:26AM +0800, Edgecombe, Rick P wrote:
> On Tue, 2025-02-11 at 09:42 +0800, Yan Zhao wrote:
> > > Is this a fix for the intermittent failure we saw on the v6.13-rc3 based kvm
> > > branch? Funnily, I can't seem to reproduce it anymore, with or without this
> > > fix.
> > Hmm, it can be reproduced in my SPR (non TDX) almost every time.
> > It depends on the timing when mprotect(PROT_READ) is completed done.
> > 
> > Attached the detailed error log in my machine at the bottom.
> 
> I must be getting lucky on timing. BTW, in the above I actually meant on either
It's also not easy for me to reproduce it in my TDX platform.
While the reproducing rate is around 100% in my non-TDX SPR machine, if I add a
minor delay in the guest (e.g. printing the value of mprotect_ro_done in the
beginning of stage 3), the test can also get passed.

> the new or old *kernel*.
Hmm, the test fails in my platform as soon as b6c304aec648 ("KVM: selftests:
Verify KVM correctly handles mprotect(PROT_READ)")" is introduced, which is from
6.13.0-rc2.
 
> >  
> > > On the fix though, doesn't this remove the coverage of writing to a region
> > > that
> > > is in the process of being made RO? I'm thinking about warnings, etc that
> > > may
> > > trigger intermittently based on bugs with a race component. I don't know if
> > > we
> > > could fix the test and still leave the write while the "mprotect(PROT_READ)
> > > is
> > > underway". It seems to be deliberate.
> > Write before "mprotect(PROT_READ)" has been tested in stage 0.
> > Not sure it's deliberate to test write in the process of being made RO.
> > If it is, maybe we could make the fix by writing to RO memory a second time
> > after mprotect_ro_done is true:
> 
> That could work if it's desirable to maintain the testing. I would mention the
> reduced scope in the log at least. Maybe Sean will chime in.
Not really a reduced scope.

Before this patch, it's also not guaranteed for the memory writes to occur
during the mprotect(PROT_READ) transition. i.e. there are 3 possibilities:
1. all writes occur before mprotect(PROT_READ) takes effect.
2. all writes occur after mprotect(PROT_READ) takes effect.
3. some writes occur before mprotect(PROT_READ) takes effect and some occur
   after.

case 3 is not guaranteed without introducing another synchronization flag.

That said, I'm not sure if invoking writes before mprotect_ro_done being read
as true is indeed necessary.

But I'm fine with either way:
1. make sure all writes are after mprotect_ro_done is true (as in this patch).
2. call the do-while loop twice to ensure some writes must happen after
   mprotect_ro_done is true

> Also, I think it needs:
> 
> Fixes: b6c304aec648 ("KVM: selftests: Verify KVM correctly handles
> mprotect(PROT_READ)")
Will add it in v2!