[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <acac43c0-d688-dcfe-51ce-b292f3c522d0@amd.com>
Date: Thu, 11 Apr 2024 17:34:19 -0500
From: "Moger, Babu" <bmoger@....com>
To: Peter Newman <peternewman@...gle.com>, Fenghua Yu <fenghua.yu@...el.com>,
Reinette Chatre <reinette.chatre@...el.com>,
James Morse <james.morse@....com>
Cc: Stephane Eranian <eranian@...gle.com>,
Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>,
Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Daniel Bristot de Oliveira
<bristot@...hat.com>, Valentin Schneider <vschneid@...hat.com>,
Uros Bizjak <ubizjak@...il.com>, Mike Rapoport <rppt@...nel.org>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Rick Edgecombe <rick.p.edgecombe@...el.com>, Xin Li <xin3.li@...el.com>,
Babu Moger <babu.moger@....com>, Shaopeng Tan <tan.shaopeng@...itsu.com>,
Maciej Wieczor-Retman <maciej.wieczor-retman@...el.com>,
Jens Axboe <axboe@...nel.dk>, Christian Brauner <brauner@...nel.org>,
Oleg Nesterov <oleg@...hat.com>, Andrew Morton <akpm@...ux-foundation.org>,
Tycho Andersen <tandersen@...flix.com>, Nicholas Piggin <npiggin@...il.com>,
Beau Belgrave <beaub@...ux.microsoft.com>,
"Matthew Wilcox (Oracle)" <willy@...radead.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1 0/6] x86/resctrl: Avoid searching tasklist during
mongrp_reparent
Hi Peter,
On 3/25/2024 12:27 PM, Peter Newman wrote:
> Hi Reinette,
>
> I've been working with users of the recently-added mongroup rename
> operation[1] who have observed the impact of back-to-back operations on
> latency-sensitive, thread pool-based services. Because changing a
> resctrl group's CLOSID (or RMID) requires all task_structs in the system
> to be inspected with the tasklist_lock read-locked, a series of move
> operations can block out thread creation for long periods of time, as
> creating threads needs to write-lock the tasklist_lock.
>
> To avoid this issue, this series replaces the CLOSID and RMID values
> cached in the task_struct with a pointer to the task's rdtgroup, through
> which the current CLOSID and RMID can be obtained indirectly during a
> context switch. Updating a group's ID then only requires the current
> task to be switched back in on all CPUs. On server hosts with very large
> thread counts, this is much less disruptive than making thread creation
> globally unavailable. However, this is less desirable on CPU-isolated,
> realtime workloads, so I am interested in suggestions on how to reach a
> compromise for the two use cases.
Before going this route, have you thought about your original solution
where CONTROL_MON groups sharing the same CLOSID?
[3] https://lore.kernel.org/lkml/CALPaoCj-zav4x6H3ffXo_O+RFan8Qb-uLy-DdtkaQTfuxY4a0w@mail.gmail.com
May be it is less disruptive than changing the context switch code.. Thoughts?
thanks
Babu
>
> Also note that the soft-ABMC[2] proposal is based on swapping the RMID
> values in mongroups when monitors are assigned to them, which will
> result in the same slowdown as was encountered with re-parenting
> monitoring groups.
>
> Using pointers to rdtgroups from the task_struct been used internally at
> Google for a few years to support an alternate CLOSID management
> technique[3], which was replaced by mongroup rename. Usage of this
> technique during production revealed an issue where threads in an
> exiting process become unreachable via for_each_process_thread() before
> destruction, but can still switch in and out. Consequently, an rmdir
> operation can fail to remove references to an rdtgroup before it frees
> it, causing the pointer to freed memory to be dereferenced after the
> structure is freed. This would result in invalid CLOSID/RMID values
> being written into MSRs, causing an exception. The workaround for this
> is to clear a task's rdtgroup pointer when it exits.
>
> As a benefit, pointers to rdtgroups are architecture-independent,
> resulting in __resctrl_sched_in() and more of the task assignment code
> becoming portable, simplifying the effort of supporting MPAM. Using a
> single pointer allows the task's current monitor and control groups to
> be determined atomically.
>
> Similar changes have been used internally on a kernel supporting MPAM.
> Upon request, I can provide the required changes to the MPAM-resctrl
> interface based on James Morse's latest MPAM snapshot[4] for reference.
>
> Thanks!
> -Peter
>
> [1] https://lore.kernel.org/r/20230419125015.693566-3-peternewman@google.com
> [2] https://lore.kernel.org/lkml/CALPaoChhKJiMAueFtgCTc7ffO++S5DJCySmxqf9ZDmhR9RQapw@mail.gmail.com
> [3] https://lore.kernel.org/lkml/CALPaoCj-zav4x6H3ffXo_O+RFan8Qb-uLy-DdtkaQTfuxY4a0w@mail.gmail.com
> [4] https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/log/?h=mpam/snapshot/v6.7-rc2
>
> Peter Newman (6):
> x86/resctrl: Move __resctrl_sched_in() out-of-line
> x86/resctrl: Add hook for releasing task_struct references
> x86/resctrl: Disallow mongroup rename on MPAM
> x86/resctrl: Use rdtgroup pointer to indicate task membership
> x86/resctrl: Abstract PQR_ASSOC from generic code
> x86/resctrl: Don't search tasklist in mongroup rename
>
> arch/x86/include/asm/resctrl.h | 93 --------
> arch/x86/kernel/cpu/resctrl/core.c | 14 +-
> arch/x86/kernel/cpu/resctrl/internal.h | 17 ++
> arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 4 +-
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 279 +++++++++++++---------
> arch/x86/kernel/process_32.c | 2 +-
> arch/x86/kernel/process_64.c | 2 +-
> include/linux/resctrl.h | 38 +++
> include/linux/sched.h | 3 +-
> kernel/exit.c | 3 +
> 10 files changed, 239 insertions(+), 216 deletions(-)
>
>
> base-commit: 4cece764965020c22cff7665b18a012006359095
Powered by blists - more mailing lists