linux-kernel - [sched/eevdf] llama-bench performace drop

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250623102732.3447837-1-gary.yang@cixtech.com>
Date: Mon, 23 Jun 2025 18:27:18 +0800
From: Gary Yang <gary.yang@...tech.com>
To: gary.yang@...tech.com,
	peterz@...radead.org
Cc: linux-kernel@...r.kernel.org
Subject: [sched/eevdf] llama-bench performace drop

Problem: The llama-bench test uses cpu to run AI model. It can create a
lot of threads, so it belongs to cpu-bounds type process. It can outputs
three scores. 1st score is primarily influenced by CPU frequency, 2nd score
is primarily influenced by memory, or L1/L2 cache, but 3rd score is influenced
by CPU frequency and memory.

when run llama-bench test on ARM A720 with kernel6.1, it outputs three scores:
root# taskset -c 0,5,6,7,8,9,10,11 llama-bench -m DeepSeek-R1-Distill-Qwen-7B-Q4_0.gguf
-pg 128,128 -t 8
| model         |     size | params | backend | threads |       test |          t/s |
| ------------- |--------: |------: | ------- | ------: |------ ---: | -----------: |

| qwen2 7B Q4_0 | 4.12 GiB | 7.62 B | CPU     |       8 |      pp512 | 58.67 ± 3.08 |
| qwen2 7B Q4_0 | 4.12 GiB | 7.62 B | CPU     |       8 |      tg128 |  9.32 ± 0.22 |
| qwen2 7B Q4_0 | 4.12 GiB | 7.62 B | CPU     |       8 |pp128+tg128 | 15.10 ± 1.08 |

build: 14d627f4 (5288)

when run llama-bench test on ARM A720 with kernel6.6.89, it outputs three scores:
root# taskset -c 0,5,6,7,8,9,10,11 llama-bench -m DeepSeek-R1-Distill-Qwen-7B-Q4_0.gguf
-pg 128,128 -t 8
| model         |     size | params | backend | threads |        test |          t/s |
| --------------|--------: |------: | ------- | ------: | ----------: | -----------: |

| qwen2 7B Q4_0 | 4.12 GiB | 7.62 B | CPU     |       8 |       pp512 | 49.89 ± 3.83 |
| qwen2 7B Q4_0 | 4.12 GiB | 7.62 B | CPU     |       8 |       tg128 |  2.66 ± 1.98 |
| qwen2 7B Q4_0 | 4.12 GiB | 7.62 B | CPU     |       8 | pp128+tg128 |  1.92 ± 0.45 |

build: 14d627f4 (5288)

We find the 2nd and 3rd scores are both lower than kernel6.1. During analyze this issue,
we note there is a new feature on kernel 6.6. It introduces EEVDF scheduler, instand of
CFS used in kernel 6.1. After we try to revert some EEVDF patches below, the two scores
are better, almost near those got from kernel 6.1.

9ef5bc6e07a5 Revert "sched/fair: Commit to EEVDF"
a21eaad7417a Revert "sched/eevdf: Curb wakeup-preemption"
2cf7e10af999 Revert "sched/eevdf: Also update slice on placement"
a19837e0f27b Revert "sched/eevdf: Fix avg_vruntime()"
eae55a336cf3 Revert "sched/eevdf: Fix min_deadline heap integrity"
ba3c4b6b5aa9 Revert "sched/eevdf: Fix pick_eevdf()"
37561f3cdba5 Revert "sched/eevdf: Fix heap corruption more"
9a80e5bf2bb5 Revert "sched/eevdf: Fix vruntime adjustment on reweight"
df483ee656d5 Revert "sched/eevdf: Always update V if se->on_rq when reweighting"
587fe3a23160 Revert "sched/eevdf: Fix miscalculation in reweight_entity() when se is not curr"
65f847ba8cc3 Revert "sched/eevdf: Prevent vlag from going out of bounds in reweight_eevdf()"

Does anyone encounter the similar issue? What suggestions do you have to us?