lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <002701db60a2$ef760820$ce621860$@telus.net>
Date: Mon, 6 Jan 2025 17:24:44 -0800
From: "Doug Smythies" <dsmythies@...us.net>
To: "'Peter Zijlstra'" <peterz@...radead.org>
Cc: <linux-kernel@...r.kernel.org>,
	<vincent.guittot@...aro.org>,
	"Doug Smythies" <dsmythies@...us.net>
Subject: RE: [REGRESSION] Re: [PATCH 00/24] Complete EEVDF

On 2024.01.06 09:14 Peter Zijlstra wrote:
> On Mon, Jan 06, 2025 at 06:04:55PM +0100, Peter Zijlstra wrote:
>> On Mon, Jan 06, 2025 at 05:59:32PM +0100, Peter Zijlstra wrote:
>>> On Mon, Jan 06, 2025 at 07:01:34AM -0800, Doug Smythies wrote:
>>>
>>>>> What is the easiest 100% load you're seeing this with?
>>>>
>>>> Lately, and specifically to be able to tell others, I have been using:
>>>>
>>>> yes > /dev/null &
>>>>
>>>> On my Intel i5-10600K, with 6 cores and 2 threads per core, 12 CPUs,
>>>> I run 12 of those work loads.
>>>
>>> On my headless ivb-ep 2 sockets, 10 cores each and 2 threads per core, I
>>> do:
>>>
>>> for ((i=0; i<40; i++)) ; do yes > /dev/null & done
>>> tools/power/x86/turbostat/turbostat --quiet --Summary --show Busy%,Bzy_MHz,IRQ,PkgWatt,PkgTmp,TSC_MHz --interval 1
>>>
>>> But no so far, nada :-( I've tried with full preemption and voluntary,
>>> HZ=1000.
>>>
>>
>> And just as I send this, I see these happen:
>>
>> 100.00  3100    2793    40302   71      195.22
>> 100.00  3100    2618    40459   72      183.58
>> 100.00  3100    2993    46215   71      209.21
>> 100.00  3100    2789    40467   71      195.19
>> 99.92   3100    2798    40589   71      195.76
>> 100.00  3100    2793    40397   72      195.46
>> ...
>> 100.00  3100    2844    41906   71      199.43
>> 100.00  3100    2779    40468   71      194.51
>> 99.96   3100    2320    40933   71      163.23
>> 100.00  3100    3529    61823   72      245.70
>> 100.00  3100    2793    40493   72      195.45
>> 100.00  3100    2793    40462   72      195.56
>>
>> They look like funny little blips. Nowhere near as bad as you had
>> though.
>
> Anyway, given you've confirmed disabling DELAY_DEQUEUE fixes things,
> could you perhaps try the below hackery for me? Its a bit of a wild
> guess, but throw stuff at wall, see what sticks etc..
>
> ---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 84902936a620..fa4b9891f93a 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3019,7 +3019,7 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
>       } else {
>
>               if (!is_migration_disabled(p)) {
> -                     if (task_on_rq_queued(p))
> +                     if (task_on_rq_queued(p) && !p->se.sched_delayed)
>                               rq = move_queued_task(rq, rf, p, dest_cpu);
>
>                       if (!pending->stop_pending) {
> @@ -3776,28 +3776,30 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags,
>   */
>  static int ttwu_runnable(struct task_struct *p, int wake_flags)
>  {
> -     struct rq_flags rf;
> -     struct rq *rq;
> -     int ret = 0;
> +     CLASS(__task_rq_lock, rq_guard)(p);
> +     struct rq *rq = rq_guard.rq;
>
> -     rq = __task_rq_lock(p, &rf);
> -     if (task_on_rq_queued(p)) {
> -             update_rq_clock(rq);
> -             if (p->se.sched_delayed)
> -                     enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED);
> -             if (!task_on_cpu(rq, p)) {
> -                     /*
> -                      * When on_rq && !on_cpu the task is preempted, see if
> -                      * it should preempt the task that is current now.
> -                      */
> -                     wakeup_preempt(rq, p, wake_flags);
> +     if (!task_on_rq_queued(p))
> +             return 0;
> +
> +     update_rq_clock(rq);
> +     if (p->se.sched_delayed) {
> +             int queue_flags = ENQUEUE_NOCLOCK | ENQUEUE_DELAYED;
> +             if (!is_cpu_allowed(p, cpu_of(rq))) {
> +                     dequeue_task(rq, p, DEQUEUE_SLEEP | queue_flags);
> +                     return 0;
>               }
> -             ttwu_do_wakeup(p);
> -             ret = 1;
> +             enqueue_task(rq, p, queue_flags);
>       }
> -     __task_rq_unlock(rq, &rf);
> -
> -     return ret;
> +     if (!task_on_cpu(rq, p)) {
> +             /*
> +              * When on_rq && !on_cpu the task is preempted, see if
> +              * it should preempt the task that is current now.
> +              */
> +             wakeup_preempt(rq, p, wake_flags);
> +     }
> +     ttwu_do_wakeup(p);
> +     return 1;
>  }
>
>  #ifdef CONFIG_SMP
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 65fa64845d9f..b4c1f6c06c18 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1793,6 +1793,11 @@ task_rq_unlock(struct rq *rq, struct task_struct *p, struct rq_flags *rf)
>       raw_spin_unlock_irqrestore(&p->pi_lock, rf->flags);
>  }
>
> +DEFINE_LOCK_GUARD_1(__task_rq_lock, struct task_struct,
> +                 _T->rq = __task_rq_lock(_T->lock, &_T->rf),
> +                 __task_rq_unlock(_T->rq, &_T->rf),
> +                 struct rq *rq; struct rq_flags rf)
> +
>  DEFINE_LOCK_GUARD_1(task_rq_lock, struct task_struct,
>                   _T->rq = task_rq_lock(_T->lock, &_T->rf),
>                   task_rq_unlock(_T->rq, _T->lock, &_T->rf),

I tried the patch on top of kernel 6.13-rc6.
It did not fix the issue.

I used my patched version of turbostat as per the previous email,
so that I could see which CPU and the CPU migration time.
CPU migration times >= 10 milliseconds are listed.
Results:

doug@s19:~date
Mon Jan  6 04:37:58 PM PST 2025
doug@s19:~$ sudo ~/kernel/linux/tools/power/x86/turbostat/turbostat --quiet --show Busy%,IRQ,Time_Of_Day_Seconds,CPU,usec --interval
1 | grep -v \- | grep -e "^[1-9]" -e "^ [1-9]" -e "^  [1-9]"
usec    Time_Of_Day_Seconds     CPU     Busy%   IRQ
  16599 1736210307.324843       11      99.76   1004
6003601 1736210314.329844       11      99.76   1018
1164604 1736210330.509843       11      99.76   1003
6003604 1736210347.524844       11      99.76   1005
  23602 1736210369.570843       11      99.76   1003
 161680 1736210384.748843       7       99.76   1002
5750600 1736210398.507843       11      99.76   1005
6003607 1736210478.587844       11      99.76   1002
 210645 1736210479.799843       3       99.76   7017
  22602 1736210495.838843       11      99.76   1002
6003390 1736210520.861844       11      99.76   1002
 108627 1736210534.984843       10      99.76   1002
  23604 1736210570.047843       11      99.76   1003
6004604 1736210600.076843       11      99.76   1003
1895606 1736210606.977843       11      99.76   1002
3110603 1736210745.226843       11      99.76   1003
6003606 1736210765.244844       11      99.76   1002
6003605 1736210785.262843       11      99.76   1002
 401642 1736210847.732843       9       99.76   1002
6003604 1736210891.781843       11      99.76   1003
6003607 1736210914.802844       11      99.76   1002
6003605 1736210945.831843       11      99.76   1002
5579609 1736210968.428848       11      99.76   1002
6003600 1736210975.433844       11      99.76   6585
  93623 1736210985.537843       10      99.76   1003
5005605 1736210994.547843       11      99.76   1003
2654601 1736211029.244843       11      99.76   1004
  17604 1736211057.290843       11      99.76   1003
  23598 1736211077.334843       11      99.76   1006
 114671 1736211079.451843       2       99.76   1003
6003603 1736211105.475843       11      99.76   1002
^Cdoug@s19:~$ date
Mon Jan  6 04:52:18 PM PST 2025
doug@s19:~$ uname -a
Linux s19 6.13.0-rc6-peterz #1320 SMP PREEMPT_DYNAMIC Mon Jan  6 16:25:39 PST 2025 x86_64 x86_64 x86_64 GNU/Linux





Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ