linux-kernel - Re: [PATCH v2 1/1] sched: Improve cache locality of RSEQ concurrency IDs for intermittent workloads

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CANpmjNM0TGU9qtS35dHBxQ_TZdSnaJviK=sGqY9kiH049AJXXQ@mail.gmail.com>
Date: Thu, 19 Sep 2024 08:00:00 +0200
From: Marco Elver <elver@...gle.com>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org, 
	Valentin Schneider <vschneid@...hat.com>, Mel Gorman <mgorman@...e.de>, 
	Steven Rostedt <rostedt@...dmis.org>, Vincent Guittot <vincent.guittot@...aro.org>, 
	Dietmar Eggemann <dietmar.eggemann@....com>, Ben Segall <bsegall@...gle.com>, 
	Yury Norov <yury.norov@...il.com>, Rasmus Villemoes <linux@...musvillemoes.dk>, 
	Dmitry Vyukov <dvyukov@...gle.com>
Subject: Re: [PATCH v2 1/1] sched: Improve cache locality of RSEQ concurrency
 IDs for intermittent workloads

On Mon, 16 Sept 2024 at 12:12, Mathieu Desnoyers
<mathieu.desnoyers@...icios.com> wrote:
[...]
> > Either migrate it along, _or_ pick a CID from a different thread that
> > ran on a CPU that shares this L3. E.g. if T1 is migrated from CPU2 to
> > CPU3, and T2 ran on CPU3 before, then it would be ok for T1 to get its
> > previous CID or T2's CID from when it ran on CPU3. Or more simply,
> > CIDs aren't tied to particular threads, but tied to a subset of CPUs
> > based on topology. If the user could specify that topology / CID
> > affinity would be nice.
>
> There is probably something to improve there, but I suspect this
> is beyond the scope of this patch, and would prefer tackling this
> topology-aware CID stealing as a separate effort. I fear that attempting
> to be too aggressive in keeping the CID allocation compact on migration
> may require us to set/clear bits in the mm_cidmask more often, which may
> impact some workloads. If we look into this we need to be careful about
> regressions.

As discussed at LPC, narrowing down a generically optimal policy is
hard. It's easy to overfit for any one particular workload, system,
etc. So the current approach certainly works. One step at a time. :-)

To cater better to different systems or workloads, it might make sense
to consider giving user space more control over the policy. Various
options exist, from tweaking sysctl knobs to eBPF hooks. My preference
here would be towards eBPF hooks that can influence CID partitioning
dynamically. But that's probably something to think about for the
future - I have no good answer, only that some way to experiment more
rapidly would help to narrow down what's optimal.

> >> When the number of threads is < number of mm allowed cpus, the
> >> migrate hooks steal the concurrency ID from CPU 2 and moves it to
> >> CPU 3 if there is only a single thread from that mm on CPU 2, which
> >> does what you wish.
> >
> > Only if the next CPU shares the cache. What if it moves the thread to
> > a CPU where that CPU's L3 cache != the previous CPU's L3 cache. In
> > that case, it'd be preferable to pick a last-used CID from the set of
> > CPUs that are grouped under that L3 cache.
>
> Without going all the way towards making this topology-aware, one
> improvement I would do for the next version of this patch is to
> prevent moving the src CID to the destination cpu (on migration)
> when the dst cpu has a recent_cid set. Basically this:

I think that makes sense.

Thanks,
-- Marco