linux-kernel - Re: Query regarding work scheduling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251031151449.GA555491@kernel-ep2>
Date: Fri, 31 Oct 2025 20:44:49 +0530
From: Subbaraya Sundeep <sbhatta@...vell.com>
To: Peter Zijlstra <peterz@...radead.org>
CC: Tejun Heo <tj@...nel.org>, <mingo@...hat.com>, <juri.lelli@...hat.com>,
        <vincent.guittot@...aro.org>, <dietmar.eggemann@....com>,
        <rostedt@...dmis.org>, <bsegall@...gle.com>, <mgorman@...e.de>,
        <vschneid@...hat.com>, <jiangshanlai@...il.com>,
        <linux-kernel@...r.kernel.org>
Subject: Re: Query regarding work scheduling

Hi Peter,

On 2025-08-19 at 16:48:18, Subbaraya Sundeep (sbhatta@...vell.com) wrote:
> Hi Peter,
> 
> On 2025-08-14 at 09:48:31, Peter Zijlstra (peterz@...radead.org) wrote:
> > On Thu, Aug 14, 2025 at 03:54:58AM +0000, Subbaraya Sundeep wrote:
> > > Hi Tejun,
> > > 
> > > On 2025-08-12 at 18:52:32, Tejun Heo (tj@...nel.org) wrote:
> > > > Hello,
> > > > 
> > > > On Tue, Aug 12, 2025 at 11:40:34AM +0000, Subbaraya Sundeep wrote:
> > > > > Hi,
> > > > > 
> > > > > One of our customers reported that when their kernel upgraded from 6.1 to 6.6 then they
> > > > > see more delay in their applications shutdown time.
> > > > > To put in simple terms, dataplane applications are run with SRIOV VFs attached to them and
> > > > > apps send number of mailbox messages to kernel PF driver (PF receives an mbox interrupt).
> > > > > During interrupt handler work is queued and messages are processed in work handler.
> > > > > I calculated the latencies (time between work queued and work execution start) of 6.1
> > > > > and 6.16 and below are the observations
> > > > > 
> > > > > 
> > > > > 6.1 mainline
> > > > > ------------
> > > > > Total samples: 4647
> > > > > Min latency: 0.001 ms
> > > > > Max latency: 0.195 ms
> > > > > Total latency: 7.797 ms
> > > > > 
> > > > > Latency Histogram (bucket size = 0.01 ms):
> > > > > 0.00 - 0.01 ms: 4644
> > > > > 0.01 - 0.02 ms: 1
> > > > > 0.03 - 0.04 ms: 1
> > > > > 0.19 - 0.20 ms: 1
> > > > > 
> > > > > ==================
> > > > > 
> > > > > 6.16 mainline
> > > > > -------------
> > > > > Total samples: 4647
> > > > > Min latency: 0.000 ms
> > > > > Max latency: 4.880 ms
> > > > > Total latency: 158.813 ms
> > > > 
> > > > Difficult to tell where the latencies are coming from. Maybe you can use
> > > > something like https://github.com/josefbacik/systing to look further into
> > > > it? All the scheduling events are tracked by default and you should be able
> > > > to add tracepoints and other events relatively easily. You can also set
> > 
> > > Thanks for the reply. I am using simple busybox to avoid overhead of any other apps
> > > or deamons running in background and taking CPU time in between.
> > 
> > Well, something is running. So there must be competing runnable tasks.
> > 
> > > I suspect this has something to do with EEVDF scheduling since this behavior is
> > > seen from 6.6 (please note I may be wrong completly).
> > 
> > EEVDF is stricter in a sense than CFS was, is looks like the workqueue
> > thread just ran out of cycles and is made to wait.
> > 
> I am a complete beginner in this area. If a work function has executed
> thousands of times by a kworker then will it be made to wait a little longer after
> some invocations since kworker has taken too much of CPU time already?
> Or the accounting starts from the moment kworker became runnable from
> sleep state only? Sorry if I am not making any sense but I want to understand
> below:
> 1. kworker sleeping -> waking up and running a function -> sleeping
> 
> Above can be done n number of times and scheduler is always in favor of
> kworker and picks it when it becomes runnable since new runnable task
> Or
> Scheduler knows that CPU has executed kworker thread lot of time
> (runtime of each invocation is tracked) so starts delaying the kworker
> for execution.
> 
> > > Are there any methods or options with which I can bring back CFS scheduling behavior
> > > maybe with the knobs in /sys/kernel/debug/sched/features as a quick check? 
> > 
> > We have a lot of knobs; but not one that says: do-what-I-want.
> > 
> > If you push a ton of work into a workqueue and have competing runnable
> > tasks; why do you think it isn't reasonable to have the competing tasks
> > run some of the time?
> > 
> > You can maybe push the slice length up a bit -- it was fixed to the
> > small side of the CFS dynamic slice. But who knows what your workload is
> > doing.
> Workload is like
> 1. userspace writes message in hw shared mbox region
> 2. triggers interrupt to PF
> 3. PF receives interrupt and queues work for processing message and sends response
> 3. userspace polls for response in while(1)
> 
> So on a single cpu system userspace while(1) code and kernel workqueue function
> are competing whereas userspace while(1) code actually depends on workqueue
> function execution in kernel.
> I am doing more experiments and will update you. Thanks for the reply
> and your time.

Sorry for long delay. I suggested customer to use sleep instead of busy loop in
their application then latencies are gone and they are okay with it.

Thanks,
Sundeep
> 
> Sundeep