lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250623102732.3447837-1-gary.yang@cixtech.com>
Date: Mon, 23 Jun 2025 18:27:18 +0800
From: Gary Yang <gary.yang@...tech.com>
To: gary.yang@...tech.com,
	peterz@...radead.org
Cc: linux-kernel@...r.kernel.org
Subject: [sched/eevdf] llama-bench performace drop

Problem: The llama-bench test uses cpu to run AI model. It can create a
lot of threads, so it belongs to cpu-bounds type process. It can outputs
three scores. 1st score is primarily influenced by CPU frequency, 2nd score
is primarily influenced by memory, or L1/L2 cache, but 3rd score is influenced
by CPU frequency and memory.

when run llama-bench test on ARM A720 with kernel6.1, it outputs three scores:
root# taskset -c 0,5,6,7,8,9,10,11 llama-bench -m DeepSeek-R1-Distill-Qwen-7B-Q4_0.gguf
-pg 128,128 -t 8
| model         |     size | params | backend | threads |       test |          t/s |
| ------------- |--------: |------: | ------- | ------: |------ ---: | -----------: |

| qwen2 7B Q4_0 | 4.12 GiB | 7.62 B | CPU     |       8 |      pp512 | 58.67 ± 3.08 |
| qwen2 7B Q4_0 | 4.12 GiB | 7.62 B | CPU     |       8 |      tg128 |  9.32 ± 0.22 |
| qwen2 7B Q4_0 | 4.12 GiB | 7.62 B | CPU     |       8 |pp128+tg128 | 15.10 ± 1.08 |

build: 14d627f4 (5288)

when run llama-bench test on ARM A720 with kernel6.6.89, it outputs three scores:
root# taskset -c 0,5,6,7,8,9,10,11 llama-bench -m DeepSeek-R1-Distill-Qwen-7B-Q4_0.gguf
-pg 128,128 -t 8
| model         |     size | params | backend | threads |        test |          t/s |
| --------------|--------: |------: | ------- | ------: | ----------: | -----------: |

| qwen2 7B Q4_0 | 4.12 GiB | 7.62 B | CPU     |       8 |       pp512 | 49.89 ± 3.83 |
| qwen2 7B Q4_0 | 4.12 GiB | 7.62 B | CPU     |       8 |       tg128 |  2.66 ± 1.98 |
| qwen2 7B Q4_0 | 4.12 GiB | 7.62 B | CPU     |       8 | pp128+tg128 |  1.92 ± 0.45 |

build: 14d627f4 (5288)

We find the 2nd and 3rd scores are both lower than kernel6.1. During analyze this issue,
we note there is a new feature on kernel 6.6. It introduces EEVDF scheduler, instand of
CFS used in kernel 6.1. After we try to revert some EEVDF patches below, the two scores
are better, almost near those got from kernel 6.1.

9ef5bc6e07a5 Revert "sched/fair: Commit to EEVDF"
a21eaad7417a Revert "sched/eevdf: Curb wakeup-preemption"
2cf7e10af999 Revert "sched/eevdf: Also update slice on placement"
a19837e0f27b Revert "sched/eevdf: Fix avg_vruntime()"
eae55a336cf3 Revert "sched/eevdf: Fix min_deadline heap integrity"
ba3c4b6b5aa9 Revert "sched/eevdf: Fix pick_eevdf()"
37561f3cdba5 Revert "sched/eevdf: Fix heap corruption more"
9a80e5bf2bb5 Revert "sched/eevdf: Fix vruntime adjustment on reweight"
df483ee656d5 Revert "sched/eevdf: Always update V if se->on_rq when reweighting"
587fe3a23160 Revert "sched/eevdf: Fix miscalculation in reweight_entity() when se is not curr"
65f847ba8cc3 Revert "sched/eevdf: Prevent vlag from going out of bounds in reweight_eevdf()"

Does anyone encounter the similar issue? What suggestions do you have to us?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ