linux-kernel - Re: [RFC][PATCH 08/10] sched/fair: Implement delayed dequeue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6b2487403dafb09fbdfb0075123fc3fda8ab7636.camel@gmx.de>
Date: Sat, 20 Apr 2024 07:57:21 +0200
From: Mike Galbraith <efault@....de>
To: Peter Zijlstra <peterz@...radead.org>, mingo@...hat.com, 
 juri.lelli@...hat.com, vincent.guittot@...aro.org,
 dietmar.eggemann@....com,  rostedt@...dmis.org, bsegall@...gle.com,
 mgorman@...e.de, bristot@...hat.com,  vschneid@...hat.com,
 linux-kernel@...r.kernel.org
Cc: "kprateek.nayak" <kprateek.nayak@....com>, tglx@...utronix.de
Subject: Re: [RFC][PATCH 08/10] sched/fair: Implement delayed dequeue

(removes apparently busted bytedance.com address and retries xmit)

Greetings!

With this version, the better CPU distribution (for tbench synchronous
net-blasting) closed the CFS vs EEVDF throughput deficit.  I verified
both by rolling the previous version forward and back-porting to 6.1
where I've got CFS and EEVDF to re-compare, now with both dequeue delay
patch versions.

As usual, there will be winners and losers, but (modulo dead buglet) it
looks kinda promising to me.

Distribution of single pinned buddy pair measured in master:
DELAY_DEQUEUE
 ----------------------------------------------------------------------------------------------------------
  Task                  |   Runtime ms  | Switches | Avg delay ms    | Max delay ms    | Sum delay ms     |
 ----------------------------------------------------------------------------------------------------------
  tbench:(2)            |   6277.099 ms |  1597104 | avg:   0.003 ms | max:   0.129 ms | sum: 4334.723 ms |
  tbench_srv:(2)        |   5724.971 ms |  1682629 | avg:   0.001 ms | max:   0.083 ms | sum: 2076.616 ms |
 ----------------------------------------------------------------------------------------------------------
  TOTAL:                |  12021.128 ms |  3280275 |                 |        1.729 ms |      6425.483 ms |
 ----------------------------------------------------------------------------------------------------------
client/server CPU distribution ~52%/48%

NO_DELAY_DEQUEUE
 ----------------------------------------------------------------------------------------------------------
  Task                  |   Runtime ms  | Switches | Avg delay ms    | Max delay ms    | Sum delay ms     |
 ----------------------------------------------------------------------------------------------------------
  tbench:(2)            |   6724.774 ms |  1546761 | avg:   0.002 ms | max:   0.409 ms | sum: 2443.549 ms |
  tbench_srv:(2)        |   5275.329 ms |  1571688 | avg:   0.002 ms | max:   0.086 ms | sum: 2734.151 ms |
 ----------------------------------------------------------------------------------------------------------
  TOTAL:                |  12019.641 ms |  3119000 |                 |     9996.367 ms |     15187.833 ms |
 ----------------------------------------------------------------------------------------------------------
client/server CPU distribution ~56%/44%

Note switches and delay sum.  For tbench, they translate directly to
throughput.  The other shoe lands with async CPU hog net-blasters, for
those, scheduler cycles tends to be wasted cycles.

	-Mike