linux-kernel - Re: [RFC PATCH v3 4/4] sched+mm: Use hazard pointers to track lazy active mm existence

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAEXW_YQ5BumqU98+BnYF8wJ1iVJ8jaKa6u3STTt-mQruXu1vfA@mail.gmail.com>
Date: Tue, 8 Oct 2024 15:51:23 -0400
From: Joel Fernandes <joel@...lfernandes.org>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: Boqun Feng <boqun.feng@...il.com>, linux-kernel@...r.kernel.org, 
	Linus Torvalds <torvalds@...ux-foundation.org>, Andrew Morton <akpm@...ux-foundation.org>, 
	Peter Zijlstra <peterz@...radead.org>, Nicholas Piggin <npiggin@...il.com>, 
	Michael Ellerman <mpe@...erman.id.au>, Greg Kroah-Hartman <gregkh@...uxfoundation.org>, 
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>, "Paul E. McKenney" <paulmck@...nel.org>, 
	Will Deacon <will@...nel.org>, Alan Stern <stern@...land.harvard.edu>, 
	John Stultz <jstultz@...gle.com>, Neeraj Upadhyay <Neeraj.Upadhyay@....com>, 
	Frederic Weisbecker <frederic@...nel.org>, Josh Triplett <josh@...htriplett.org>, 
	Uladzislau Rezki <urezki@...il.com>, Steven Rostedt <rostedt@...dmis.org>, 
	Lai Jiangshan <jiangshanlai@...il.com>, Zqiang <qiang.zhang1211@...il.com>, 
	Ingo Molnar <mingo@...hat.com>, Waiman Long <longman@...hat.com>, 
	Mark Rutland <mark.rutland@....com>, Thomas Gleixner <tglx@...utronix.de>, 
	Vlastimil Babka <vbabka@...e.cz>, maged.michael@...il.com, Mateusz Guzik <mjguzik@...il.com>, 
	Jonas Oberhauser <jonas.oberhauser@...weicloud.com>, rcu@...r.kernel.org, linux-mm@...ck.org, 
	lkmm@...ts.linux.dev, Vineeth Pillai <vineethrp@...gle.com>
Subject: Re: [RFC PATCH v3 4/4] sched+mm: Use hazard pointers to track lazy
 active mm existence

On Tue, Oct 8, 2024 at 9:52 AM Mathieu Desnoyers
<mathieu.desnoyers@...icios.com> wrote:
>
> Replace lazy active mm existence tracking with hazard pointers. This
> removes the following implementations and their associated config
> options:
>
> - MMU_LAZY_TLB_REFCOUNT
> - MMU_LAZY_TLB_SHOOTDOWN
> - This removes the call_rcu delayed mm drop for RT.
>
> It leverages the fact that each CPU only ever have at most one single
> lazy active mm. This makes it a very good fit for a hazard pointer
> domain implemented with one hazard pointer slot per CPU.
>
> -static void cleanup_lazy_tlbs(struct mm_struct *mm)
> +static void remove_lazy_mm_hp(int cpu, struct hazptr_slot *slot, void *addr)
>  {
> -       if (!IS_ENABLED(CONFIG_MMU_LAZY_TLB_SHOOTDOWN)) {
> -               /*
> -                * In this case, lazy tlb mms are refounted and would not reach
> -                * __mmdrop until all CPUs have switched away and mmdrop()ed.
> -                */
> -               return;
> -       }
> +       smp_call_function_single(cpu, do_shoot_lazy_tlb, addr, 1);
> +       smp_call_function_single(cpu, do_check_lazy_tlb, addr, 1);
> +}
>
> +static void cleanup_lazy_tlbs(struct mm_struct *mm)
> +{
[...]
> -       on_each_cpu_mask(mm_cpumask(mm), do_shoot_lazy_tlb, (void *)mm, 1);
> -       if (IS_ENABLED(CONFIG_DEBUG_VM_SHOOT_LAZIES))
> -               on_each_cpu(do_check_lazy_tlb, (void *)mm, 1);
> +       hazptr_scan(&hazptr_domain_sched_lazy_mm, mm, remove_lazy_mm_hp);

Hey Mathieu, Take comments with a grain of salt because I am digging
into active_mm after a while.
It seems to me IMO this seems a strange hazard pointer callback
usecase. Because "scan" here immediately retires even though the
reader has a "reference". Here it is more like, the callback is
forcing all other readers holding a "reference" to switch immediately
whether they like it or not and not wait until _they_ release the
reference. There is no such precedent in RCU for instance, where a
callback never runs before a reader even finishes.

That might work for active_mm, but it sounds like a fringe usecase to
me that it might probably be better to just force
CONFIG_MMU_LAZY_TLB_SHOOTDOWN=y for everyone and use on_each_cpu()
instead. That will give the same code simplification for this patch
without requiring hazard pointers AFAICS? Or maybe start with that,
and _then_ convert to HP if it makes sense to? These are just some
thoughts and I am Ok with all the reviewer's consensus.

And if I understand correctly, for this usecase - we are not even
grabbing a "true reference" to the mm_struct object because direct
access to mm_struct should require a proper mmgrab(), not a lazy_tlb
flavored one? -- correct me if I'm wrong though.

Also, isn't it that on x86, now with this patch there will be more
IPIs, whereas previously the global refcount was not requiring that as
the last kthread switching out would no longer access the old mm?
Might it be worth checking the performance of fork/exit and if that
scales?

thanks,

- Joel