[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z50BbuUQWIaDPRzK@phenom.ffwll.local>
Date: Fri, 31 Jan 2025 17:59:26 +0100
From: Simona Vetter <simona.vetter@...ll.ch>
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: Thomas Hellström <thomas.hellstrom@...ux.intel.com>,
Yonatan Maman <ymaman@...dia.com>, kherbst@...hat.com,
lyude@...hat.com, dakr@...hat.com, airlied@...il.com,
simona@...ll.ch, leon@...nel.org, jglisse@...hat.com,
akpm@...ux-foundation.org, GalShalom@...dia.com,
dri-devel@...ts.freedesktop.org, nouveau@...ts.freedesktop.org,
linux-kernel@...r.kernel.org, linux-rdma@...r.kernel.org,
linux-mm@...ck.org, linux-tegra@...r.kernel.org
Subject: Re: [RFC 1/5] mm/hmm: HMM API to enable P2P DMA for device private
pages
On Thu, Jan 30, 2025 at 01:42:17PM -0400, Jason Gunthorpe wrote:
> On Thu, Jan 30, 2025 at 05:09:44PM +0100, Simona Vetter wrote:
> > > > An optional callback is a lot less scary to me here (or redoing
> > > > hmm_range_fault or whacking the migration helpers a few times) looks a lot
> > > > less scary than making pgmap->owner mutable in some fashion.
> > >
> > > It extra for every single 4k page on every user :\
> > >
> > > And what are you going to do better inside this callback?
> >
> > Having more comfy illusions :-P
>
> Exactly!
>
> > Slightly more seriously, I can grab some locks and make life easier,
>
> Yes, but then see my concern about performance again. Now you are
> locking/unlocking every 4k? And then it still races since it can
> change after hmm_range_fault returns. That's not small, and not really
> better.
Hm yeah, I think that's the death argument for the callback. Consider me
convinced on that being a bad idea.
> > whereas sprinkling locking or even barriers over pgmap->owner in core mm
> > is not going to fly. Plus more flexibility, e.g. when the interconnect
> > doesn't work for atomics or some other funny reason it only works for some
> > of the traffic, where you need to more dynamically decide where memory is
> > ok to sit.
>
> Sure, an asymmetric mess could be problematic, and we might need more
> later, but lets get to that first..
>
> > Or checking p2pdma connectivity and all that stuff.
>
> Should be done in the core code, don't want drivers open coding this
> stuff.
Yeah so after amdkfd and noveau I agree that letting drivers mess this up
isn't great. But see below, I'm not sure whether going all the way to core
code is the right approach, at least for gpu internal needs.
> > Also note that fundamentally you can't protect against the hotunplug or
> > driver unload case for hardware access. So some access will go to nowhere
> > when that happens, until we've torn down all the mappings and migrated
> > memory out.
>
> I think a literal (safe!) hot unplug must always use the page map
> revoke, and that should be safely locked between hmm_range_fault and
> the notifier.
>
> If the underlying fabric is loosing operations during an unplanned hot
> unplug I expect things will need resets to recover..
So one aspect where I don't like the pgmap->owner approach much is that
it's a big thing to get right, and it feels a bit to me that we don't yet
know the right questions.
A bit related is that we'll have to do some driver-specific migration
after hmm_range_fault anyway for allocation policies. With coherent
interconnect that'd be up to numactl, but for driver private it's all up
to the driver. And once we have that, we can also migrate memory around
that's misplaced for functional and not just performance reasons.
The plan I discussed with Thomas a while back at least for gpus was to
have that as a drm_devpagemap library, which would have a common owner (or
maybe per driver or so as Thomas suggested). Then it would still not be in
drivers, but also a bit easier to mess around with for experiments. And
once we have some clear things that hmm_range_fault should do instead for
everyone, we can lift them up.
Doing this at a pagemap level should also be much more efficient, since I
think we can make the assumption that access limitations are uniform for a
given dev_pagemap (and if they're not if e.g. not the entire vram is bus
visible, drivers can handle that by splitting things up).
But upfront speccing all this out doesn't seem like a good idea to,
because I honestly don't know what we all need.
Cheers, Sima
--
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Powered by blists - more mailing lists