linux-kernel - Re: [RFC PATCH 1/6] mm/mmu_notifier: Allow multiple struct mmu_interval

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aKNT8GUu0r3i4Ikq@lstrano-desk.jf.intel.com>
Date: Mon, 18 Aug 2025 09:25:20 -0700
From: Matthew Brost <matthew.brost@...el.com>
To: Jason Gunthorpe <jgg@...pe.ca>
CC: Thomas Hellström <thomas.hellstrom@...ux.intel.com>,
	<intel-xe@...ts.freedesktop.org>, Andrew Morton <akpm@...ux-foundation.org>,
	Simona Vetter <simona.vetter@...ll.ch>, Dave Airlie <airlied@...il.com>,
	<dri-devel@...ts.freedesktop.org>, <linux-mm@...ck.org>,
	<linux-kernel@...r.kernel.org>, Christian König
	<christian.koenig@....com>
Subject: Re: [RFC PATCH 1/6] mm/mmu_notifier: Allow multiple struct
 mmu_interval_notifier passes

On Mon, Aug 18, 2025 at 01:07:26PM -0300, Jason Gunthorpe wrote:
> On Sat, Aug 09, 2025 at 03:51:32PM +0200, Thomas Hellström wrote:
> > GPU use-cases for mmu_interval_notifiers with hmm often involve
> > starting a gpu operation and then waiting for it to complete.
> > These operations are typically context preemption or TLB flushing.
> > 
> > With single-pass notifiers per GPU this doesn't scale in
> > multi-gpu scenarios. In those scenarios we'd want to first start
> > preemption- or TLB flushing on all GPUs and as a second pass wait
> > for them to complete on all gpus.
> 
> The idea seems reasonable but I'm not sure I like the naming of
> 'multipass' or necessarily the complexity.
> 
> This is sort of a co-operative multithreading thing.
> 
> Do you really need a linked list here? At least justify the design
> choices in the commit message..
> 

I think this choice makes sense: it allows embedding the wait state from
the initial notifier call into the pass structure. Patch [6] shows this
by attaching the issued TLB invalidation fences to the pass. Since a
single notifier may be invoked multiple times with different ranges but
the same seqno, I think this is the correct design choice—otherwise
there’s no unambiguous way to track per-invocation wait state. I agree
this should be documented in both the commit message and kernel-doc.

Matt

[6] https://patchwork.freedesktop.org/patch/667844/?series=152725&rev=1

> > +struct mmu_interval_notifier_pass {
> > +	struct list_head link;
> > +	/**
> > +	 * @pass: Driver callback for additionall pass.
> > +	 * @additional_pass: Pointer to the mmu_interval_notifier_pass structure.
> > +	 * @range: The mmu_notifier_range.
> > +	 * @cur_seq: The current sequence set by the first pass.
> > +	 *
> > +	 * Return: Either a pointer to a valid mmu_interval_notifier_pass for
> > +	 * another pass to be called, or %NULL if processing is complete for this
> > +	 * notifier. There is no error reporting mechanism for additional passes.
> > +	 */
> > +	struct mmu_interval_notifier_pass *
> > +	(*pass) (struct mmu_interval_notifier_pass *additional_pass,
> > +		 const struct mmu_notifier_range *range,
> > +		 unsigned long cur_seq);
> > +};
> > +
> >  /**
> >   * struct mmu_interval_notifier_ops
> >   * @invalidate: Upon return the caller must stop using any SPTEs within this
> > @@ -243,6 +269,10 @@ struct mmu_interval_notifier_ops {
> >  	bool (*invalidate)(struct mmu_interval_notifier *interval_sub,
> >  			   const struct mmu_notifier_range *range,
> >  			   unsigned long cur_seq);
> > +	bool (*invalidate_multipass)(struct mmu_interval_notifier *interval_sub,
> > +				     const struct mmu_notifier_range *range,
> > +				     unsigned long cur_seq,
> > +				     struct mmu_interval_notifier_pass **pass);
> 
> Couldn't this just have a pass number counter and some return code to
> indicate this notifier is done?
> 
> Or do you really need more than 2 passes? Start/finish make sense
> too. Otherwise you may have issues overlapping the backgroundable
> operations between different driver types?
> 
> > +static void mn_itree_additional_passes(struct list_head *additional_passes,
> > +				       const struct mmu_notifier_range *range,
> > +				       unsigned long cur_seq)
> > +{
> > +	struct mmu_interval_notifier_pass *p, *next;
> > +
> > +	while (!list_empty(additional_passes)) {
> > +		list_for_each_entry_safe(p, next, additional_passes, link) {
> > +			list_del_init(&p->link);
> > +			p = p->pass(p, range, cur_seq);
> > +			if (p)
> > +				list_add_tail(&p->link, additional_passes);
> > +		}
> > +	}
> 
> Like this is very naive, if one driver has only 'prepare' and 'wait
> for device ack' passes, then it will immediately stop being concurrent
> while another device may be still working on its 3rd pass.
> 
> So either this should be more complicated to properly support
> different numbers of passes per registration or we should just support
> two passes and be done with it?
> 
> Jason