[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250814130403.GF699432@nvidia.com>
Date: Thu, 14 Aug 2025 10:04:03 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Mika Penttilä <mpenttil@...hat.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
David Hildenbrand <david@...hat.com>,
Leon Romanovsky <leonro@...dia.com>,
Alistair Popple <apopple@...dia.com>,
Balbir Singh <balbirs@...dia.com>
Subject: Re: [RFC PATCH 1/4] mm: use current as mmu notifier's owner
On Thu, Aug 14, 2025 at 03:53:00PM +0300, Mika Penttilä wrote:
>
> On 8/14/25 15:40, Jason Gunthorpe wrote:
> > On Thu, Aug 14, 2025 at 10:19:26AM +0300, Mika Penttilä wrote:
> >> When doing migration in combination with device fault handling,
> >> detect the case in the interval notifier.
> >>
> >> Without that, we would livelock with our own invalidations
> >> while migrating and splitting pages during fault handling.
> >>
> >> Note, pgmap_owner, used in some other code paths as owner for filtering,
> >> is not readily available for split path, so use current for this use case.
> >> Also, current and pgmap_owner, both being pointers to memory, can not be
> >> mis-interpreted to each other.
> >>
> >> Cc: David Hildenbrand <david@...hat.com>
> >> Cc: Jason Gunthorpe <jgg@...dia.com>
> >> Cc: Leon Romanovsky <leonro@...dia.com>
> >> Cc: Alistair Popple <apopple@...dia.com>
> >> Cc: Balbir Singh <balbirs@...dia.com>
> >>
> >> Signed-off-by: Mika Penttilä <mpenttil@...hat.com>
> >> ---
> >> lib/test_hmm.c | 5 +++++
> >> mm/huge_memory.c | 6 +++---
> >> mm/rmap.c | 4 ++--
> >> 3 files changed, 10 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/lib/test_hmm.c b/lib/test_hmm.c
> >> index 761725bc713c..cd5c139213be 100644
> >> --- a/lib/test_hmm.c
> >> +++ b/lib/test_hmm.c
> >> @@ -269,6 +269,11 @@ static bool dmirror_interval_invalidate(struct mmu_interval_notifier *mni,
> >> range->owner == dmirror->mdevice)
> >> return true;
> >>
> >> + if (range->event == MMU_NOTIFY_CLEAR &&
> >> + range->owner == current) {
> >> + return true;
> >> + }
> > I don't understand this, there is nothing in hmm that says only
> > current can call hmm_range_fault, and indeed most applications won't
> > even gurantee that.
>
> No it's the opposite, if we are ourselves the invalidator, don't care.
I think you've missed the point, you cannot use range->owner in any
way to tell "we are ourselves the invalidator". It is simply not the
meaning of range->owner.
> > So if this plan relies on something like the above in drivers I don't
> > see how it can work.
> >
> > If this is just some hack for tests, try instead to find a solution
> > that more accurately matches what a real driver should do.
> >
> > But this also seems overall troublesome to your goal, if you do a
> > migrate inside hmm_range_fault() it will generate an invalidation call
> > back and that will increment the seqlock and we will loop
> > hmm_range_fault() again which rewalks.
>
> That's the problem this solves.
> The semantics is "if we are the invalidator don't wait for invalidate end",
> aka don't mmu_interval_set_seq() that would make hang in the next mmu_interval_read_begin(),
> waiting the invalidate to end
I doubt we can skip mmu_interval_set_seq(), doing so can corrupt concurrent
hmm_range_fault in some other thread.
Nor, as I said, can we actually skip the invalidation toward HW
anyhow.
At the very least this commit message fails to explain the new locking
proposal, or justify why it can work.
Jason
Powered by blists - more mailing lists