[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <93c3f9ac-0225-429a-807c-d11c649c819e@redhat.com>
Date: Fri, 7 Mar 2025 14:00:05 -0500
From: Waiman Long <llong@...hat.com>
To: Juri Lelli <juri.lelli@...hat.com>, linux-kernel@...r.kernel.org,
cgroups@...r.kernel.org
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
Tejun Heo <tj@...nel.org>, Johannes Weiner <hannes@...xchg.org>,
Michal Koutný <mkoutny@...e.com>,
Qais Yousef <qyousef@...alina.io>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Swapnil Sapkal <swapnil.sapkal@....com>,
Shrikanth Hegde <sshegde@...ux.ibm.com>, Phil Auld <pauld@...hat.com>,
luca.abeni@...tannapisa.it, tommaso.cucinotta@...tannapisa.it,
Jon Hunter <jonathanh@...dia.com>
Subject: Re: [PATCH v2 0/8] Fix SCHED_DEADLINE bandwidth accounting during
suspend
On 3/6/25 9:10 AM, Juri Lelli wrote:
> Hello!
>
> Jon reported [1] a suspend regression on a Tegra board configured to
> boot with isolcpus and bisected it to commit 53916d5fd3c0
> ("sched/deadline: Check bandwidth overflow earlier for hotplug").
>
> Root cause analysis pointed out that we are currently failing to
> correctly clear and restore bandwidth accounting on root domains after
> changes that initiate from partition_sched_domains(), as it is the case
> for suspend operations on that board.
>
> This is v2 [2] of the proposed approach to fix the issue. With respect
> to v1, the following implements the approach by:
>
> - 01: filter out DEADLINE special tasks
> - 02: preparatory wrappers to be able to grab sched_domains_mutex on
> UP (remove !SMP wrappers - Waiman)
> - 03: generalize unique visiting of root domains so that we can
> re-use the mechanism elsewhere
> - 04: the bulk of the approach, clean and rebuild after changes
> - 05: clean up a now redundant call
> - 06: remove partition_and_rebuild_sched_domains() (Waiman)
> - 07: stop exposing partition_sched_domains_locked (Waiman)
>
> Please test and review. The set is also available at
>
> git@...hub.com:jlelli/linux.git upstream/deadline/domains-suspend
>
> Best,
> Juri
>
> 1 - https://lore.kernel.org/lkml/ba51a43f-796d-4b79-808a-b8185905638a@nvidia.com/
> 2 - v1 https://lore.kernel.org/lkml/20250304084045.62554-1-juri.lelli@redhat.com
>
> Juri Lelli (8):
> sched/deadline: Ignore special tasks when rebuilding domains
> sched/topology: Wrappers for sched_domains_mutex
> sched/deadline: Generalize unique visiting of root domains
> sched/deadline: Rebuild root domain accounting after every update
> sched/topology: Remove redundant dl_clear_root_domain call
> cgroup/cpuset: Remove partition_and_rebuild_sched_domains
> sched/topology: Stop exposing partition_sched_domains_locked
> include/{topology,cpuset}: Move dl_rebuild_rd_accounting to cpuset.h
>
> include/linux/cpuset.h | 5 +++++
> include/linux/sched.h | 2 ++
> include/linux/sched/deadline.h | 7 +++++++
> include/linux/sched/topology.h | 10 ---------
> kernel/cgroup/cpuset.c | 27 +++++++++----------------
> kernel/sched/core.c | 4 ++--
> kernel/sched/deadline.c | 37 ++++++++++++++++++++--------------
> kernel/sched/debug.c | 8 ++++----
> kernel/sched/rt.c | 2 ++
> kernel/sched/sched.h | 2 +-
> kernel/sched/topology.c | 32 +++++++++++++----------------
> 11 files changed, 69 insertions(+), 67 deletions(-)
>
>
> base-commit: 48a5eed9ad584315c30ed35204510536235ce402
I have run my cpuset test and it completed successfully without any issue.
Tested-by: Waiman Long <longman@...hat.com>
Powered by blists - more mailing lists