linux-kernel - Query regarding work scheduling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <aJsoMnkoYYpNzBNu@opensource>
Date: Tue, 12 Aug 2025 11:40:34 +0000
From: Subbaraya Sundeep <sbhatta@...vell.com>
To: <mingo@...hat.com>, <peterz@...radead.org>, <juri.lelli@...hat.com>,
        <vincent.guittot@...aro.org>, <dietmar.eggemann@....com>,
        <rostedt@...dmis.org>, <bsegall@...gle.com>, <mgorman@...e.de>,
        <vschneid@...hat.com>, <tj@...nel.org>, <jiangshanlai@...il.com>
CC: <linux-kernel@...r.kernel.org>
Subject: Query regarding work scheduling

Hi,

One of our customers reported that when their kernel upgraded from 6.1 to 6.6 then they
see more delay in their applications shutdown time.
To put in simple terms, dataplane applications are run with SRIOV VFs attached to them and
apps send number of mailbox messages to kernel PF driver (PF receives an mbox interrupt).
During interrupt handler work is queued and messages are processed in work handler.
I calculated the latencies (time between work queued and work execution start) of 6.1
and 6.16 and below are the observations

6.1 mainline
------------
Total samples: 4647
Min latency: 0.001 ms
Max latency: 0.195 ms
Total latency: 7.797 ms

Latency Histogram (bucket size = 0.01 ms):
0.00 - 0.01 ms: 4644
0.01 - 0.02 ms: 1
0.03 - 0.04 ms: 1
0.19 - 0.20 ms: 1

==================

6.16 mainline
-------------
Total samples: 4647
Min latency: 0.000 ms
Max latency: 4.880 ms
Total latency: 158.813 ms

Latency Histogram (bucket size = 0.01 ms):
0.00 - 0.01 ms: 4573
0.03 - 0.04 ms: 1
0.19 - 0.20 ms: 1
0.70 - 0.71 ms: 1
0.72 - 0.73 ms: 1
0.92 - 0.93 ms: 3
0.93 - 0.94 ms: 1
0.95 - 0.96 ms: 2
0.97 - 0.98 ms: 2
0.98 - 0.99 ms: 6
0.99 - 1.00 ms: 8
1.00 - 1.01 ms: 14
1.08 - 1.09 ms: 1
1.41 - 1.42 ms: 1
1.79 - 1.80 ms: 1
1.80 - 1.81 ms: 1
1.81 - 1.82 ms: 1
1.92 - 1.93 ms: 1
1.99 - 2.00 ms: 1
2.34 - 2.35 ms: 1
2.61 - 2.62 ms: 1
2.99 - 3.00 ms: 1
3.14 - 3.15 ms: 1
3.62 - 3.63 ms: 1
3.70 - 3.71 ms: 1
3.71 - 3.72 ms: 1
3.75 - 3.76 ms: 1
3.87 - 3.88 ms: 4
3.90 - 3.91 ms: 1
3.91 - 3.92 ms: 2
3.92 - 3.93 ms: 2
3.94 - 3.95 ms: 2
3.95 - 3.96 ms: 1
3.98 - 3.99 ms: 2
3.99 - 4.00 ms: 3
4.87 - 4.88 ms: 2

==================

As seen from histograms above, latency is more with 6.16 kernel.
Above is tested on uniprocessor system. SMP is fine since work is
scheduled to all cores. Please let me know what is going on and
provide some pointers if am missing very basic here.
I changed only kernel images. Application and rootfs are same in
both cases (to ensure no additonal daemons or load in 6.16 case)
Let me know if there is any knob in 6.16 to have same
behavior as 6.1.

Scheduler features of 6.16 are as below:
# cat /sys/kernel/debug/sched/features
PLACE_LAG PLACE_DEADLINE_INITIAL PLACE_REL_DEADLINE RUN_TO_PARITY PREEMPT_SHORT
NO_NEXT_BUDDY PICK_BUDDY CACHE_HOT_BUDDY DELAY_DEQUEUE DELAY_ZERO
WAKEUP_PREEMPTION NO_HRTICK NO_HRTICK_DL NONTASK_CAPACITY TTWU_QUEUE
SIS_UTIL NO_WARN_DOUBLE_CLOCK RT_PUSH_IPI NO_RT_RUNTIME_SHARE NO_LB_MIN
ATTACH_AGE_LOAD WA_IDLE WA_WEIGHT WA_BIAS UTIL_EST NO_LATENCY_WARN

Thanks,
Sundeep