[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250314162034.GA12958@amazon.com>
Date: Fri, 14 Mar 2025 16:20:34 +0000
From: Hagar Hemdan <hagarhem@...zon.com>
To: Vincent Guittot <vincent.guittot@...aro.org>
CC: Dietmar Eggemann <dietmar.eggemann@....com>, Ingo Molnar
<mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Juri Lelli
<juri.lelli@...hat.com>, Steven Rostedt <rostedt@...dmis.org>, Ben Segall
<bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, Valentin Schneider
<vschneid@...hat.com>, <linux-kernel@...r.kernel.org>,
<wuchi.zero@...il.com>, <abuehaze@...zon.com>, <hagarhem@...zon.com>
Subject: Re: [PATCH] /sched/core: Fix Unixbench spawn test regression
On Fri, Mar 14, 2025 at 05:06:50PM +0100, Vincent Guittot wrote:
> On Thu, 13 Mar 2025 at 10:21, Hagar Hemdan <hagarhem@...zon.com> wrote:
> >
> > On Wed, Mar 12, 2025 at 03:41:40PM +0100, Dietmar Eggemann wrote:
> > > On 11/03/2025 17:35, Vincent Guittot wrote:
> > > > On Mon, 10 Mar 2025 at 16:29, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
> > > >>
> > > >> On 10/03/2025 14:59, Vincent Guittot wrote:
> > > >>> On Thu, 6 Mar 2025 at 17:26, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
> > > >>>>
> > > >>>> Hagar reported a 30% drop in UnixBench spawn test with commit
> > > >>>> eff6c8ce8d4d ("sched/core: Reduce cost of sched_move_task when config
> > > >>>> autogroup") on a m6g.xlarge AWS EC2 instance with 4 vCPUs and 16 GiB RAM
> > > >>>> (aarch64) (single level MC sched domain) [1].
> > > >>>>
> > > >>>> There is an early bail from sched_move_task() if p->sched_task_group is
> > > >>>> equal to p's 'cpu cgroup' (sched_get_task_group()). E.g. both are
> > > >>>> pointing to taskgroup '/user.slice/user-1000.slice/session-1.scope'
> > > >>>> (Ubuntu '22.04.5 LTS').
> > > >>>
> > > >>> Isn't this same use case that has been used by commit eff6c8ce8d4d to
> > > >>> show the benefit of adding the test if ((group ==
> > > >>> tsk->sched_task_group) ?
> > > >>> Adding Wuchi who added the condition
> > > >>
> > > >> IMHO, UnixBench spawn reports a performance number according to how many
> > > >> tasks could be spawned whereas, IIUC, commit eff6c8ce8d4d was reporting
> > > >> the time spend in sched_move_task().
> > > >
> > > > But does not your patch revert the benefits shown in the figures of
> > > > commit eff6c8ce8d4d ? It skipped sched_move task in do_exit autogroup
> > > > and you adds it back
> > >
> > > Yeah, we do need the PELT update in sched_change_group()
> > > (task_change_group_fair()) in the do_exit() path to get the 30% score
> > > back in 'UnixBench spawn'. Even that means we need more time due to this
> > > in sched_move_task().
> > >
> > > I retested this and it turns out that 'group == tsk->sched_task_group'
> > > is only true when sched_move_task() is called from exit.
> > >
> > > So to get the score back for 'UnixBench spawn' we should rather revert
> > > commit eff6c8ce8d4d.
> > >
> > > The analysis in my patch still holds though.
> > >
> > > If you guys agree I can send the revert with my analysis in the
> > > patch-header.
> > Agree. The follow up commit fa614b4feb5a ("sched: Simplify sched_move_task()")
> > needs to be reverted as well.
>
> Why do you think it should be reverted as well ?
I meant the revert of eff6c8ce8d4d7 requires fa614b4feb5a to be
reverted first. Dietmar has already done this in his revert
https://lore.kernel.org/all/20250314151345.275739-1-dietmar.eggemann@arm.com/,
so it's all good now.
Powered by blists - more mailing lists