[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <002701db60a2$ef760820$ce621860$@telus.net>
Date: Mon, 6 Jan 2025 17:24:44 -0800
From: "Doug Smythies" <dsmythies@...us.net>
To: "'Peter Zijlstra'" <peterz@...radead.org>
Cc: <linux-kernel@...r.kernel.org>,
<vincent.guittot@...aro.org>,
"Doug Smythies" <dsmythies@...us.net>
Subject: RE: [REGRESSION] Re: [PATCH 00/24] Complete EEVDF
On 2024.01.06 09:14 Peter Zijlstra wrote:
> On Mon, Jan 06, 2025 at 06:04:55PM +0100, Peter Zijlstra wrote:
>> On Mon, Jan 06, 2025 at 05:59:32PM +0100, Peter Zijlstra wrote:
>>> On Mon, Jan 06, 2025 at 07:01:34AM -0800, Doug Smythies wrote:
>>>
>>>>> What is the easiest 100% load you're seeing this with?
>>>>
>>>> Lately, and specifically to be able to tell others, I have been using:
>>>>
>>>> yes > /dev/null &
>>>>
>>>> On my Intel i5-10600K, with 6 cores and 2 threads per core, 12 CPUs,
>>>> I run 12 of those work loads.
>>>
>>> On my headless ivb-ep 2 sockets, 10 cores each and 2 threads per core, I
>>> do:
>>>
>>> for ((i=0; i<40; i++)) ; do yes > /dev/null & done
>>> tools/power/x86/turbostat/turbostat --quiet --Summary --show Busy%,Bzy_MHz,IRQ,PkgWatt,PkgTmp,TSC_MHz --interval 1
>>>
>>> But no so far, nada :-( I've tried with full preemption and voluntary,
>>> HZ=1000.
>>>
>>
>> And just as I send this, I see these happen:
>>
>> 100.00 3100 2793 40302 71 195.22
>> 100.00 3100 2618 40459 72 183.58
>> 100.00 3100 2993 46215 71 209.21
>> 100.00 3100 2789 40467 71 195.19
>> 99.92 3100 2798 40589 71 195.76
>> 100.00 3100 2793 40397 72 195.46
>> ...
>> 100.00 3100 2844 41906 71 199.43
>> 100.00 3100 2779 40468 71 194.51
>> 99.96 3100 2320 40933 71 163.23
>> 100.00 3100 3529 61823 72 245.70
>> 100.00 3100 2793 40493 72 195.45
>> 100.00 3100 2793 40462 72 195.56
>>
>> They look like funny little blips. Nowhere near as bad as you had
>> though.
>
> Anyway, given you've confirmed disabling DELAY_DEQUEUE fixes things,
> could you perhaps try the below hackery for me? Its a bit of a wild
> guess, but throw stuff at wall, see what sticks etc..
>
> ---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 84902936a620..fa4b9891f93a 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3019,7 +3019,7 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
> } else {
>
> if (!is_migration_disabled(p)) {
> - if (task_on_rq_queued(p))
> + if (task_on_rq_queued(p) && !p->se.sched_delayed)
> rq = move_queued_task(rq, rf, p, dest_cpu);
>
> if (!pending->stop_pending) {
> @@ -3776,28 +3776,30 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags,
> */
> static int ttwu_runnable(struct task_struct *p, int wake_flags)
> {
> - struct rq_flags rf;
> - struct rq *rq;
> - int ret = 0;
> + CLASS(__task_rq_lock, rq_guard)(p);
> + struct rq *rq = rq_guard.rq;
>
> - rq = __task_rq_lock(p, &rf);
> - if (task_on_rq_queued(p)) {
> - update_rq_clock(rq);
> - if (p->se.sched_delayed)
> - enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED);
> - if (!task_on_cpu(rq, p)) {
> - /*
> - * When on_rq && !on_cpu the task is preempted, see if
> - * it should preempt the task that is current now.
> - */
> - wakeup_preempt(rq, p, wake_flags);
> + if (!task_on_rq_queued(p))
> + return 0;
> +
> + update_rq_clock(rq);
> + if (p->se.sched_delayed) {
> + int queue_flags = ENQUEUE_NOCLOCK | ENQUEUE_DELAYED;
> + if (!is_cpu_allowed(p, cpu_of(rq))) {
> + dequeue_task(rq, p, DEQUEUE_SLEEP | queue_flags);
> + return 0;
> }
> - ttwu_do_wakeup(p);
> - ret = 1;
> + enqueue_task(rq, p, queue_flags);
> }
> - __task_rq_unlock(rq, &rf);
> -
> - return ret;
> + if (!task_on_cpu(rq, p)) {
> + /*
> + * When on_rq && !on_cpu the task is preempted, see if
> + * it should preempt the task that is current now.
> + */
> + wakeup_preempt(rq, p, wake_flags);
> + }
> + ttwu_do_wakeup(p);
> + return 1;
> }
>
> #ifdef CONFIG_SMP
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 65fa64845d9f..b4c1f6c06c18 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1793,6 +1793,11 @@ task_rq_unlock(struct rq *rq, struct task_struct *p, struct rq_flags *rf)
> raw_spin_unlock_irqrestore(&p->pi_lock, rf->flags);
> }
>
> +DEFINE_LOCK_GUARD_1(__task_rq_lock, struct task_struct,
> + _T->rq = __task_rq_lock(_T->lock, &_T->rf),
> + __task_rq_unlock(_T->rq, &_T->rf),
> + struct rq *rq; struct rq_flags rf)
> +
> DEFINE_LOCK_GUARD_1(task_rq_lock, struct task_struct,
> _T->rq = task_rq_lock(_T->lock, &_T->rf),
> task_rq_unlock(_T->rq, _T->lock, &_T->rf),
I tried the patch on top of kernel 6.13-rc6.
It did not fix the issue.
I used my patched version of turbostat as per the previous email,
so that I could see which CPU and the CPU migration time.
CPU migration times >= 10 milliseconds are listed.
Results:
doug@s19:~date
Mon Jan 6 04:37:58 PM PST 2025
doug@s19:~$ sudo ~/kernel/linux/tools/power/x86/turbostat/turbostat --quiet --show Busy%,IRQ,Time_Of_Day_Seconds,CPU,usec --interval
1 | grep -v \- | grep -e "^[1-9]" -e "^ [1-9]" -e "^ [1-9]"
usec Time_Of_Day_Seconds CPU Busy% IRQ
16599 1736210307.324843 11 99.76 1004
6003601 1736210314.329844 11 99.76 1018
1164604 1736210330.509843 11 99.76 1003
6003604 1736210347.524844 11 99.76 1005
23602 1736210369.570843 11 99.76 1003
161680 1736210384.748843 7 99.76 1002
5750600 1736210398.507843 11 99.76 1005
6003607 1736210478.587844 11 99.76 1002
210645 1736210479.799843 3 99.76 7017
22602 1736210495.838843 11 99.76 1002
6003390 1736210520.861844 11 99.76 1002
108627 1736210534.984843 10 99.76 1002
23604 1736210570.047843 11 99.76 1003
6004604 1736210600.076843 11 99.76 1003
1895606 1736210606.977843 11 99.76 1002
3110603 1736210745.226843 11 99.76 1003
6003606 1736210765.244844 11 99.76 1002
6003605 1736210785.262843 11 99.76 1002
401642 1736210847.732843 9 99.76 1002
6003604 1736210891.781843 11 99.76 1003
6003607 1736210914.802844 11 99.76 1002
6003605 1736210945.831843 11 99.76 1002
5579609 1736210968.428848 11 99.76 1002
6003600 1736210975.433844 11 99.76 6585
93623 1736210985.537843 10 99.76 1003
5005605 1736210994.547843 11 99.76 1003
2654601 1736211029.244843 11 99.76 1004
17604 1736211057.290843 11 99.76 1003
23598 1736211077.334843 11 99.76 1006
114671 1736211079.451843 2 99.76 1003
6003603 1736211105.475843 11 99.76 1002
^Cdoug@s19:~$ date
Mon Jan 6 04:52:18 PM PST 2025
doug@s19:~$ uname -a
Linux s19 6.13.0-rc6-peterz #1320 SMP PREEMPT_DYNAMIC Mon Jan 6 16:25:39 PST 2025 x86_64 x86_64 x86_64 GNU/Linux
Powered by blists - more mailing lists