linux-kernel - Re: [PATCH v1 0/6] x86/resctrl: Avoid searching tasklist during mongrp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <acac43c0-d688-dcfe-51ce-b292f3c522d0@amd.com>
Date: Thu, 11 Apr 2024 17:34:19 -0500
From: "Moger, Babu" <bmoger@....com>
To: Peter Newman <peternewman@...gle.com>, Fenghua Yu <fenghua.yu@...el.com>,
 Reinette Chatre <reinette.chatre@...el.com>,
 James Morse <james.morse@....com>
Cc: Stephane Eranian <eranian@...gle.com>,
 Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
 Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
 x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>,
 Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>,
 Vincent Guittot <vincent.guittot@...aro.org>,
 Dietmar Eggemann <dietmar.eggemann@....com>,
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
 Mel Gorman <mgorman@...e.de>, Daniel Bristot de Oliveira
 <bristot@...hat.com>, Valentin Schneider <vschneid@...hat.com>,
 Uros Bizjak <ubizjak@...il.com>, Mike Rapoport <rppt@...nel.org>,
 "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
 Rick Edgecombe <rick.p.edgecombe@...el.com>, Xin Li <xin3.li@...el.com>,
 Babu Moger <babu.moger@....com>, Shaopeng Tan <tan.shaopeng@...itsu.com>,
 Maciej Wieczor-Retman <maciej.wieczor-retman@...el.com>,
 Jens Axboe <axboe@...nel.dk>, Christian Brauner <brauner@...nel.org>,
 Oleg Nesterov <oleg@...hat.com>, Andrew Morton <akpm@...ux-foundation.org>,
 Tycho Andersen <tandersen@...flix.com>, Nicholas Piggin <npiggin@...il.com>,
 Beau Belgrave <beaub@...ux.microsoft.com>,
 "Matthew Wilcox (Oracle)" <willy@...radead.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1 0/6] x86/resctrl: Avoid searching tasklist during
 mongrp_reparent

Hi Peter,

On 3/25/2024 12:27 PM, Peter Newman wrote:
> Hi Reinette,
>
> I've been working with users of the recently-added mongroup rename
> operation[1] who have observed the impact of back-to-back operations on
> latency-sensitive, thread pool-based services. Because changing a
> resctrl group's CLOSID (or RMID) requires all task_structs in the system
> to be inspected with the tasklist_lock read-locked, a series of move
> operations can block out thread creation for long periods of time, as
> creating threads needs to write-lock the tasklist_lock.
>
> To avoid this issue, this series replaces the CLOSID and RMID values
> cached in the task_struct with a pointer to the task's rdtgroup, through
> which the current CLOSID and RMID can be obtained indirectly during a
> context switch. Updating a group's ID then only requires the current
> task to be switched back in on all CPUs. On server hosts with very large
> thread counts, this is much less disruptive than making thread creation
> globally unavailable. However, this is less desirable on CPU-isolated,
> realtime workloads, so I am interested in suggestions on how to reach a
> compromise for the two use cases.

Before going this route, have you thought about your original solution 
where CONTROL_MON groups sharing the same CLOSID?

[3] https://lore.kernel.org/lkml/CALPaoCj-zav4x6H3ffXo_O+RFan8Qb-uLy-DdtkaQTfuxY4a0w@mail.gmail.com

May be it is less disruptive than changing the context switch code.. Thoughts?

thanks
Babu

>
> Also note that the soft-ABMC[2] proposal is based on swapping the RMID
> values in mongroups when monitors are assigned to them, which will
> result in the same slowdown as was encountered with re-parenting
> monitoring groups.
>
> Using pointers to rdtgroups from the task_struct been used internally at
> Google for a few years to support an alternate CLOSID management
> technique[3], which was replaced by mongroup rename. Usage of this
> technique during production revealed an issue where threads in an
> exiting process become unreachable via for_each_process_thread() before
> destruction, but can still switch in and out. Consequently, an rmdir
> operation can fail to remove references to an rdtgroup before it frees
> it, causing the pointer to freed memory to be dereferenced after the
> structure is freed. This would result in invalid CLOSID/RMID values
> being written into MSRs, causing an exception. The workaround for this
> is to clear a task's rdtgroup pointer when it exits.
>
> As a benefit, pointers to rdtgroups are architecture-independent,
> resulting in __resctrl_sched_in() and more of the task assignment code
> becoming portable, simplifying the effort of supporting MPAM. Using a
> single pointer allows the task's current monitor and control groups to
> be determined atomically.
>
> Similar changes have been used internally on a kernel supporting MPAM.
> Upon request, I can provide the required changes to the MPAM-resctrl
> interface based on James Morse's latest MPAM snapshot[4] for reference.
>
> Thanks!
> -Peter
>
> [1] https://lore.kernel.org/r/20230419125015.693566-3-peternewman@google.com
> [2] https://lore.kernel.org/lkml/CALPaoChhKJiMAueFtgCTc7ffO++S5DJCySmxqf9ZDmhR9RQapw@mail.gmail.com
> [3] https://lore.kernel.org/lkml/CALPaoCj-zav4x6H3ffXo_O+RFan8Qb-uLy-DdtkaQTfuxY4a0w@mail.gmail.com
> [4] https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/log/?h=mpam/snapshot/v6.7-rc2
>
> Peter Newman (6):
>    x86/resctrl: Move __resctrl_sched_in() out-of-line
>    x86/resctrl: Add hook for releasing task_struct references
>    x86/resctrl: Disallow mongroup rename on MPAM
>    x86/resctrl: Use rdtgroup pointer to indicate task membership
>    x86/resctrl: Abstract PQR_ASSOC from generic code
>    x86/resctrl: Don't search tasklist in mongroup rename
>
>   arch/x86/include/asm/resctrl.h            |  93 --------
>   arch/x86/kernel/cpu/resctrl/core.c        |  14 +-
>   arch/x86/kernel/cpu/resctrl/internal.h    |  17 ++
>   arch/x86/kernel/cpu/resctrl/pseudo_lock.c |   4 +-
>   arch/x86/kernel/cpu/resctrl/rdtgroup.c    | 279 +++++++++++++---------
>   arch/x86/kernel/process_32.c              |   2 +-
>   arch/x86/kernel/process_64.c              |   2 +-
>   include/linux/resctrl.h                   |  38 +++
>   include/linux/sched.h                     |   3 +-
>   kernel/exit.c                             |   3 +
>   10 files changed, 239 insertions(+), 216 deletions(-)
>
>
> base-commit: 4cece764965020c22cff7665b18a012006359095