[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4wwyz-21UZL6kONWTTfPUWR+yXK2vL3BwVMydD15cSP+Q@mail.gmail.com>
Date: Tue, 3 Jun 2025 19:51:45 +1200
From: Barry Song <21cnbao@...il.com>
To: Jann Horn <jannh@...gle.com>
Cc: akpm@...ux-foundation.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, Barry Song <v-songbaohua@...o.com>,
"Liam R. Howlett" <Liam.Howlett@...cle.com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
David Hildenbrand <david@...hat.com>, Vlastimil Babka <vbabka@...e.cz>,
Suren Baghdasaryan <surenb@...gle.com>, Lokesh Gidra <lokeshgidra@...gle.com>,
Tangquan Zheng <zhengtangquan@...o.com>
Subject: Re: [PATCH RFC v2] mm: use per_vma lock for MADV_DONTNEED
On Tue, Jun 3, 2025 at 2:56 AM Jann Horn <jannh@...gle.com> wrote:
>
> On Sat, May 31, 2025 at 12:00 AM Barry Song <21cnbao@...il.com> wrote:
> > On Fri, May 30, 2025 at 10:07 PM Jann Horn <jannh@...gle.com> wrote:
> > > On Fri, May 30, 2025 at 12:44 PM Barry Song <21cnbao@...il.com> wrote:
> > > > @@ -1714,19 +1770,24 @@ static int madvise_do_behavior(struct mm_struct *mm,
> > > > unsigned long start, size_t len_in,
> > > > struct madvise_behavior *madv_behavior)
> > > > {
> > > > + struct vm_area_struct *vma = madv_behavior->vma;
> > > > int behavior = madv_behavior->behavior;
> > > > +
> > > > struct blk_plug plug;
> > > > unsigned long end;
> > > > int error;
> > > >
> > > > if (is_memory_failure(behavior))
> > > > return madvise_inject_error(behavior, start, start + len_in);
> > > > - start = untagged_addr_remote(mm, start);
> > > > + start = untagged_addr(start);
> > >
> > > Why is this okay? I see that X86's untagged_addr_remote() asserts that
> > > the mmap lock is held, which is no longer the case here with your
> > > patch, but untagged_addr() seems wrong here, since we can be operating
> > > on another process. I think especially on X86 with 5-level paging and
> > > LAM, there can probably be cases where address bits are used for part
> > > of the virtual address in one task while they need to be masked off in
> > > another task?
> > >
> > > I wonder if you'll have to refactor X86 and Risc-V first to make this
> > > work... ideally by making sure that their address tagging state
> > > updates are atomic and untagged_area_remote() works locklessly.
> >
> > If possible, can we try to avoid this at least for this stage? We all
> > agree that
> > a per-VMA lock for DONTNEED is long overdue. The main goal of the patch
> > is to drop the mmap_lock for high-frequency madvise operations like
> > MADV_DONTNEED and potentially MADV_FREE. For these two cases, it's highly
> > unlikely that one process would be managing the memory of another. In v2,
> > we're modifying common code, which is why we ended up here.
> >
> > We could consider doing:
> >
> > if (current->mm == mm)
> > untagged_addr(start);
> > else
> > untagged_addr_remote(mm, start);
>
> Ah, in other words, basically make it so that for now we don't use VMA
> locking on remote processes, and then we can have two different paths
> here for "local operation" and "MM-locked operation"? That's not very
> pretty but from a correctness perspective I'm fine with that.
Right, that’s exactly what I mean—we may hold off on remote `madvise` for
now unless we can find a straightforward way to fix up the architecture
code, especially since the tagging implementations in x86 and RISC-V are
quite confusing.
It’s particularly tricky for RISC-V, which supports two different PMLEN
values simultaneously. Resolving the untagging issue will likely require
extensive discussions with both the x86 and RISC-V architecture teams.
long set_tagged_addr_ctrl(struct task_struct *task, unsigned long arg)
{
...
/*
* Prefer the smallest PMLEN that satisfies the user's request,
* in case choosing a larger PMLEN has a performance impact.
*/
pmlen = FIELD_GET(PR_PMLEN_MASK, arg);
if (pmlen == PMLEN_0) {
pmm = ENVCFG_PMM_PMLEN_0;
} else if (pmlen <= PMLEN_7 && have_user_pmlen_7) {
pmlen = PMLEN_7;
pmm = ENVCFG_PMM_PMLEN_7;
} else if (pmlen <= PMLEN_16 && have_user_pmlen_16) {
pmlen = PMLEN_16;
pmm = ENVCFG_PMM_PMLEN_16;
} else {
return -EINVAL;
}
...
}
It’s strange that x86’s LAM U48 was rejected, while RISC-V’s PMLEN values
of 7 and 16 were both accepted.
Another reason is that we’re not too concerned about remote `madvise`
at this stage, as our current focus is on high-frequency cases like
`MADV_DONTNEED`, and possibly `MADV_FREE`.
Thanks
Barry
Powered by blists - more mailing lists