[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c8bca664-76cf-52d7-bd73-795b467c460b@linux.vnet.ibm.com>
Date: Sat, 5 Aug 2023 21:07:18 +0530
From: Shrikanth Hegde <sshegde@...ux.vnet.ibm.com>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Swapnil Sapkal <Swapnil.Sapkal@....com>
Cc: linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
Valentin Schneider <vschneid@...hat.com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Juri Lelli <juri.lelli@...hat.com>,
Aaron Lu <aaron.lu@...el.com>, x86@...nel.org,
Peter Zijlstra <peterz@...radead.org>,
Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
Subject: Re: [RFC PATCH 1/1] sched: Extend cpu idle state for 1ms
On 8/4/23 1:42 AM, Mathieu Desnoyers wrote:
> On 8/3/23 01:53, Swapnil Sapkal wrote:
> [...]
>
> Those are interesting metrics. I still have no clue why it behaves that
> way though.
I was thinking this might be the case. some workload would benefit while
some would suffer. Specially ones which favor latency over cache might suffer.
>
> More specifically: I also noticed that the number of migrations is
> heavily affected, and that select_task_rq behavior changes drastically.
> I'm unsure why though.
>
FWIU, load_balance uses idle_cpu to calculate the number of idle_cpus in the
sched_domain. Maybe that is getting confused with 1ms delay concept. Likely
sched_domain stay balanced because of this, and hence less migrations.
In select_rq_fair, prev_cpu will returned by wake_affine_idle since idle_cpu will return
true more often. Hence task will get woken on the same CPU as before instead of migrating.
on SMT systems, gain is further added by having the threads on single CPU, thereby
making it ST mode. That is subject to utilization. Running on ST is more faster compared
to running on SMT.
-------------------------------------------------------------------------------------------
Ran the hackbench with perf stat on SMT system. That indicates slightly higher ST mode cycles
and ips improves slightly thereby making it faster.
baseline 6.5-rc1:
hackbench -pipe (50 groups)
Time: 0.67 ( Average of 50 runs)
94,432,028,029 instructions # 0.52 insn per cycle
168,130,543,309 cycles (% of total cycles)
1,162,153,934 PM_RUN_CYC_ST_MODE ( 0.70% )
613,018,646 PM_RUN_CYC_SMT2_MODE ( 0.35% )
166,358,778,832 PM_RUN_CYC_SMT4_MODE (98.95% )
Latest patch in this series.
https://lore.kernel.org/lkml/447f756c-9c79-f801-8257-a97cc8256efe@efficios.com/#t
hackbench -pipe (50 groups)
Time: 0.62 ( Average of 50 runs)
92,078,390,150 instructions # 0.55 insn per cycle
159,368,662,574 cycles
1,330,653,107 PM_RUN_CYC_ST_MODE ( 0.83% )
656,950,636 PM_RUN_CYC_SMT2_MODE ( 0.41% )
157,384,470,123 PM_RUN_CYC_SMT4_MODE (98.75% )
>>
>> Can you share your build config just in case I am missing something.
>
> Build config attached.
>
> Thanks,
>
> Mathieu
>
>>
>>>
>>> And using it now brings the hackbench wall time at 28s :)
>>>
>>> Thanks,
>>>
>>> Mathieu
>>>
>>>>
>>>>>>> struct task_struct *stop;
>>>>>>> unsigned long next_balance;
>>>>>>> struct mm_struct *prev_mm;
>>>>>
>>>
>> --
>> Thanks and regards,
>> Swapnil
>
Powered by blists - more mailing lists