lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aJuNcM-BfznsVDWl@slm.duckdns.org>
Date: Tue, 12 Aug 2025 08:52:32 -1000
From: Tejun Heo <tj@...nel.org>
To: Subbaraya Sundeep <sbhatta@...vell.com>
Cc: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
	vincent.guittot@...aro.org, dietmar.eggemann@....com,
	rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
	vschneid@...hat.com, jiangshanlai@...il.com,
	linux-kernel@...r.kernel.org
Subject: Re: Query regarding work scheduling

Hello,

On Tue, Aug 12, 2025 at 11:40:34AM +0000, Subbaraya Sundeep wrote:
> Hi,
> 
> One of our customers reported that when their kernel upgraded from 6.1 to 6.6 then they
> see more delay in their applications shutdown time.
> To put in simple terms, dataplane applications are run with SRIOV VFs attached to them and
> apps send number of mailbox messages to kernel PF driver (PF receives an mbox interrupt).
> During interrupt handler work is queued and messages are processed in work handler.
> I calculated the latencies (time between work queued and work execution start) of 6.1
> and 6.16 and below are the observations
> 
> 
> 6.1 mainline
> ------------
> Total samples: 4647
> Min latency: 0.001 ms
> Max latency: 0.195 ms
> Total latency: 7.797 ms
> 
> Latency Histogram (bucket size = 0.01 ms):
> 0.00 - 0.01 ms: 4644
> 0.01 - 0.02 ms: 1
> 0.03 - 0.04 ms: 1
> 0.19 - 0.20 ms: 1
> 
> ==================
> 
> 6.16 mainline
> -------------
> Total samples: 4647
> Min latency: 0.000 ms
> Max latency: 4.880 ms
> Total latency: 158.813 ms

Difficult to tell where the latencies are coming from. Maybe you can use
something like https://github.com/josefbacik/systing to look further into
it? All the scheduling events are tracked by default and you should be able
to add tracepoints and other events relatively easily. You can also set
trigger conditions so that trace around a high latency event can be captured
reliably.

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ