linux-kernel - Re: Query regarding work scheduling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aJuNcM-BfznsVDWl@slm.duckdns.org>
Date: Tue, 12 Aug 2025 08:52:32 -1000
From: Tejun Heo <tj@...nel.org>
To: Subbaraya Sundeep <sbhatta@...vell.com>
Cc: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
	vincent.guittot@...aro.org, dietmar.eggemann@....com,
	rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
	vschneid@...hat.com, jiangshanlai@...il.com,
	linux-kernel@...r.kernel.org
Subject: Re: Query regarding work scheduling

Hello,

On Tue, Aug 12, 2025 at 11:40:34AM +0000, Subbaraya Sundeep wrote:
> Hi,
> 
> One of our customers reported that when their kernel upgraded from 6.1 to 6.6 then they
> see more delay in their applications shutdown time.
> To put in simple terms, dataplane applications are run with SRIOV VFs attached to them and
> apps send number of mailbox messages to kernel PF driver (PF receives an mbox interrupt).
> During interrupt handler work is queued and messages are processed in work handler.
> I calculated the latencies (time between work queued and work execution start) of 6.1
> and 6.16 and below are the observations
> 
> 
> 6.1 mainline
> ------------
> Total samples: 4647
> Min latency: 0.001 ms
> Max latency: 0.195 ms
> Total latency: 7.797 ms
> 
> Latency Histogram (bucket size = 0.01 ms):
> 0.00 - 0.01 ms: 4644
> 0.01 - 0.02 ms: 1
> 0.03 - 0.04 ms: 1
> 0.19 - 0.20 ms: 1
> 
> ==================
> 
> 6.16 mainline
> -------------
> Total samples: 4647
> Min latency: 0.000 ms
> Max latency: 4.880 ms
> Total latency: 158.813 ms

Difficult to tell where the latencies are coming from. Maybe you can use
something like https://github.com/josefbacik/systing to look further into
it? All the scheduling events are tracked by default and you should be able
to add tracepoints and other events relatively easily. You can also set
trigger conditions so that trace around a high latency event can be captured
reliably.

Thanks.

-- 
tejun