[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAAhSdy26LC0xWisbxx+10mTe=D6cXcePtH8t6=TkzMYso7+jUQ@mail.gmail.com>
Date: Thu, 19 Jun 2025 12:34:54 +0530
From: Anup Patel <anup@...infault.org>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Anup Patel <apatel@...tanamicro.com>, Andrew Jones <ajones@...tanamicro.com>,
zhouquan@...as.ac.cn, atishp@...shpatra.org, paul.walmsley@...ive.com,
palmer@...belt.com, linux-kernel@...r.kernel.org,
linux-riscv@...ts.infradead.org, kvm@...r.kernel.org,
kvm-riscv@...ts.infradead.org
Subject: Re: [PATCH] RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map()
On Tue, Jun 17, 2025 at 8:06 PM Sean Christopherson <seanjc@...gle.com> wrote:
>
> On Sun, Jun 15, 2025, Anup Patel wrote:
> > On Sat, Jun 14, 2025 at 3:59 AM Sean Christopherson <seanjc@...gle.com> wrote:
> > >
> > > On Thu, Jun 12, 2025, Andrew Jones wrote:
> > > > On Wed, Jun 11, 2025 at 09:17:36AM -0700, Sean Christopherson wrote:
> > > > > Looks like y'all also have a bug where an -EEXIST will be returned to userspace,
> > > > > and will generate what's probably a spurious kvm_err() message.
> > > >
> > > > On 32-bit riscv, due to losing the upper bits of the physical address? Or
> > > > is there yet another thing to fix?
> > >
> > > Another bug, I think. gstage_set_pte() returns -EEXIST if a PTE exists, and I
> > > _assume_ that's supposed to be benign? But this code returns it blindly:
> >
> > gstage_set_pte() returns -EEXIST only when it was expecting a non-leaf
> > PTE at a particular level but got a leaf PTE
>
> Right, but isn't returning -EEXIST all the way to userspace undesirable behavior?
>
> E.g. in this sequence, KVM will return -EEXIST and incorrectly terminate the VM
> (assuming the VMM doesn't miraculously recover somehow):
>
> 1. Back the VM with HugeTLBFS
> 2. Fault-in memory, i.e. create hugepage mappings
> 3. Enable KVM_MEM_LOG_DIRTY_PAGES
> 4. Write-protection fault, kvm_riscv_gstage_map() tries to create a writable
> non-huge mapping.
> 5. gstage_set_pte() encounters the huge leaf PTE before reaching the target
> level, and returns -EEXIST.
The gstage_set_pte() does not fail in any of the above cases because the
desired page table "level" of the PTE is passed to gstage_set_pte() as
parameter. The -EEXIST failure is only when gstage_set_pte() sees an
existing leaf PTE at a level above the desired page table level which can
only occur if there is some BUG in KVM g-stage programming.
>
> AFAICT, gstage_wp_memory_region() doesn't split/shatter/demote hugepages, it
> simply clears _PAGE_WRITE.
>
> It's entirely possible I'm missing something that makes the above scenario
> impossible in practice, but at this point I'm genuinely curious :-)
The -EEXIST failure in gstage_set_pte() is very unlikely to happen but I see
your point about unnecessarily exiting to user space since user space has
nothing to do with this failure.
I think it's better to WARN() and return 0 instead of returning -EEXIST.
Regards,
Anup
Powered by blists - more mailing lists