linux-kernel - Re: Query regarding work scheduling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aJ1eElydTbZfBq5X@opensource>
Date: Thu, 14 Aug 2025 03:54:58 +0000
From: Subbaraya Sundeep <sbhatta@...vell.com>
To: Tejun Heo <tj@...nel.org>
CC: <mingo@...hat.com>, <peterz@...radead.org>, <juri.lelli@...hat.com>,
        <vincent.guittot@...aro.org>, <dietmar.eggemann@....com>,
        <rostedt@...dmis.org>, <bsegall@...gle.com>, <mgorman@...e.de>,
        <vschneid@...hat.com>, <jiangshanlai@...il.com>,
        <linux-kernel@...r.kernel.org>
Subject: Re: Query regarding work scheduling

Hi Tejun,

On 2025-08-12 at 18:52:32, Tejun Heo (tj@...nel.org) wrote:
> Hello,
> 
> On Tue, Aug 12, 2025 at 11:40:34AM +0000, Subbaraya Sundeep wrote:
> > Hi,
> > 
> > One of our customers reported that when their kernel upgraded from 6.1 to 6.6 then they
> > see more delay in their applications shutdown time.
> > To put in simple terms, dataplane applications are run with SRIOV VFs attached to them and
> > apps send number of mailbox messages to kernel PF driver (PF receives an mbox interrupt).
> > During interrupt handler work is queued and messages are processed in work handler.
> > I calculated the latencies (time between work queued and work execution start) of 6.1
> > and 6.16 and below are the observations
> > 
> > 
> > 6.1 mainline
> > ------------
> > Total samples: 4647
> > Min latency: 0.001 ms
> > Max latency: 0.195 ms
> > Total latency: 7.797 ms
> > 
> > Latency Histogram (bucket size = 0.01 ms):
> > 0.00 - 0.01 ms: 4644
> > 0.01 - 0.02 ms: 1
> > 0.03 - 0.04 ms: 1
> > 0.19 - 0.20 ms: 1
> > 
> > ==================
> > 
> > 6.16 mainline
> > -------------
> > Total samples: 4647
> > Min latency: 0.000 ms
> > Max latency: 4.880 ms
> > Total latency: 158.813 ms
> 
> Difficult to tell where the latencies are coming from. Maybe you can use
> something like https://github.com/josefbacik/systing to look further into
> it? All the scheduling events are tracked by default and you should be able
> to add tracepoints and other events relatively easily. You can also set
Thanks for the reply. I am using simple busybox to avoid overhead of any other apps
or deamons running in background and taking CPU time in between.
I will try building systing and running it. 6.16 histogram shows that it
is not one high latency event causing overall latency but bunch of small
latencies are adding up and causing big latency.
I suspect this has something to do with EEVDF scheduling since this behavior is
seen from 6.6 (please note I may be wrong completly).
Are there any methods or options with which I can bring back CFS scheduling behavior
maybe with the knobs in /sys/kernel/debug/sched/features as a quick check? 

Thanks,
Sundeep
> trigger conditions so that trace around a high latency event can be captured
> reliably.
> 
> Thanks.
> 
> -- 
> tejun