linux-kernel - Re: Query regarding work scheduling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250814094831.GT4067720@noisy.programming.kicks-ass.net>
Date: Thu, 14 Aug 2025 11:48:31 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Subbaraya Sundeep <sbhatta@...vell.com>
Cc: Tejun Heo <tj@...nel.org>, mingo@...hat.com, juri.lelli@...hat.com,
	vincent.guittot@...aro.org, dietmar.eggemann@....com,
	rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
	vschneid@...hat.com, jiangshanlai@...il.com,
	linux-kernel@...r.kernel.org
Subject: Re: Query regarding work scheduling

On Thu, Aug 14, 2025 at 03:54:58AM +0000, Subbaraya Sundeep wrote:
> Hi Tejun,
> 
> On 2025-08-12 at 18:52:32, Tejun Heo (tj@...nel.org) wrote:
> > Hello,
> > 
> > On Tue, Aug 12, 2025 at 11:40:34AM +0000, Subbaraya Sundeep wrote:
> > > Hi,
> > > 
> > > One of our customers reported that when their kernel upgraded from 6.1 to 6.6 then they
> > > see more delay in their applications shutdown time.
> > > To put in simple terms, dataplane applications are run with SRIOV VFs attached to them and
> > > apps send number of mailbox messages to kernel PF driver (PF receives an mbox interrupt).
> > > During interrupt handler work is queued and messages are processed in work handler.
> > > I calculated the latencies (time between work queued and work execution start) of 6.1
> > > and 6.16 and below are the observations
> > > 
> > > 
> > > 6.1 mainline
> > > ------------
> > > Total samples: 4647
> > > Min latency: 0.001 ms
> > > Max latency: 0.195 ms
> > > Total latency: 7.797 ms
> > > 
> > > Latency Histogram (bucket size = 0.01 ms):
> > > 0.00 - 0.01 ms: 4644
> > > 0.01 - 0.02 ms: 1
> > > 0.03 - 0.04 ms: 1
> > > 0.19 - 0.20 ms: 1
> > > 
> > > ==================
> > > 
> > > 6.16 mainline
> > > -------------
> > > Total samples: 4647
> > > Min latency: 0.000 ms
> > > Max latency: 4.880 ms
> > > Total latency: 158.813 ms
> > 
> > Difficult to tell where the latencies are coming from. Maybe you can use
> > something like https://github.com/josefbacik/systing to look further into
> > it? All the scheduling events are tracked by default and you should be able
> > to add tracepoints and other events relatively easily. You can also set

> Thanks for the reply. I am using simple busybox to avoid overhead of any other apps
> or deamons running in background and taking CPU time in between.

Well, something is running. So there must be competing runnable tasks.

> I suspect this has something to do with EEVDF scheduling since this behavior is
> seen from 6.6 (please note I may be wrong completly).

EEVDF is stricter in a sense than CFS was, is looks like the workqueue
thread just ran out of cycles and is made to wait.

> Are there any methods or options with which I can bring back CFS scheduling behavior
> maybe with the knobs in /sys/kernel/debug/sched/features as a quick check? 

We have a lot of knobs; but not one that says: do-what-I-want.

If you push a ton of work into a workqueue and have competing runnable
tasks; why do you think it isn't reasonable to have the competing tasks
run some of the time?

You can maybe push the slice length up a bit -- it was fixed to the
small side of the CFS dynamic slice. But who knows what your workload is
doing.