lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6b2487403dafb09fbdfb0075123fc3fda8ab7636.camel@gmx.de>
Date: Sat, 20 Apr 2024 07:57:21 +0200
From: Mike Galbraith <efault@....de>
To: Peter Zijlstra <peterz@...radead.org>, mingo@...hat.com, 
 juri.lelli@...hat.com, vincent.guittot@...aro.org,
 dietmar.eggemann@....com,  rostedt@...dmis.org, bsegall@...gle.com,
 mgorman@...e.de, bristot@...hat.com,  vschneid@...hat.com,
 linux-kernel@...r.kernel.org
Cc: "kprateek.nayak" <kprateek.nayak@....com>, tglx@...utronix.de
Subject: Re: [RFC][PATCH 08/10] sched/fair: Implement delayed dequeue

(removes apparently busted bytedance.com address and retries xmit)

Greetings!

With this version, the better CPU distribution (for tbench synchronous
net-blasting) closed the CFS vs EEVDF throughput deficit.  I verified
both by rolling the previous version forward and back-porting to 6.1
where I've got CFS and EEVDF to re-compare, now with both dequeue delay
patch versions.

As usual, there will be winners and losers, but (modulo dead buglet) it
looks kinda promising to me.

Distribution of single pinned buddy pair measured in master:
DELAY_DEQUEUE
 ----------------------------------------------------------------------------------------------------------
  Task                  |   Runtime ms  | Switches | Avg delay ms    | Max delay ms    | Sum delay ms     |
 ----------------------------------------------------------------------------------------------------------
  tbench:(2)            |   6277.099 ms |  1597104 | avg:   0.003 ms | max:   0.129 ms | sum: 4334.723 ms |
  tbench_srv:(2)        |   5724.971 ms |  1682629 | avg:   0.001 ms | max:   0.083 ms | sum: 2076.616 ms |
 ----------------------------------------------------------------------------------------------------------
  TOTAL:                |  12021.128 ms |  3280275 |                 |        1.729 ms |      6425.483 ms |
 ----------------------------------------------------------------------------------------------------------
client/server CPU distribution ~52%/48%

NO_DELAY_DEQUEUE
 ----------------------------------------------------------------------------------------------------------
  Task                  |   Runtime ms  | Switches | Avg delay ms    | Max delay ms    | Sum delay ms     |
 ----------------------------------------------------------------------------------------------------------
  tbench:(2)            |   6724.774 ms |  1546761 | avg:   0.002 ms | max:   0.409 ms | sum: 2443.549 ms |
  tbench_srv:(2)        |   5275.329 ms |  1571688 | avg:   0.002 ms | max:   0.086 ms | sum: 2734.151 ms |
 ----------------------------------------------------------------------------------------------------------
  TOTAL:                |  12019.641 ms |  3119000 |                 |     9996.367 ms |     15187.833 ms |
 ----------------------------------------------------------------------------------------------------------
client/server CPU distribution ~56%/44%

Note switches and delay sum.  For tbench, they translate directly to
throughput.  The other shoe lands with async CPU hog net-blasters, for
those, scheduler cycles tends to be wasted cycles.

	-Mike

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ