lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <aIyDfs1Dh0OGJEgM@jlelli-thinkpadt14gen4.remote.csb>
Date: Fri, 1 Aug 2025 11:06:06 +0200
From: Juri Lelli <juri.lelli@...hat.com>
To: David Haufe <dhaufe@...plextrading.com>
Cc: linux-kernel@...r.kernel.org
Subject: Re: Kernel 6.14.11 dl_server_timer(...) causing IPI/Function Call
 Interrupts on isolcpu/nohz_full cores, performance regression

Hi,

On 31/07/25 12:48, David Haufe wrote:
> Kernel 6.16 shows the issue. /kernel/sched/fair.c calls dl_server_start()
> and there is no assessment prior to that point or later of the
> isolcpu/nohz_full+single-process condition of the core. Same function_graph
> trace generated. Code is the same at tip+sched/core.
> 
> On Thu, Jul 31, 2025 at 2:02 AM Juri Lelli <juri.lelli@...hat.com> wrote:
> 
> > Hello,
> >
> > Thanks for the report.
> >
> > On 30/07/25 11:51, David Haufe wrote:
> > > [1.] Kernel 6.14.11 dl_server_timer(...) causing IPI/Function Call
> > > Interrupts on isolcpu/nohz_full cores, performance regression
> > > [2.] The code for dl_server_timer is causing new IPI/Function Call
> > > Interrupts to fire on isolcpu/nohz_full cores which previously had no
> > > interrupts. When there is a single, SCHED_OTHER process running on an
> > > isolcpu/nohz_full core, dl_server_timer executes on a housekeeping
> > > core. This ultimately invokes add_nr_running() and
> > > sched_update_tick_dependency() and finally tick_nohz_dep_set_cpu().
> > > Setting the single process running on an isolcpu/nohz_full core to
> > > FIFO (rt priority) prevents this new interrupt, as it is not seen as a
> > > fair schedule process anymore. Having to use rt priority is
> > > unnecessary and a regression to prior kernels. Kernel function_graph
> > > trace below showing core 0 (housekeeping) sending the IPI to core 19
> > > (nohz_full, isolcpu, rcu_nocb_poll) which is running a single
> > > SCHED_OTHER process. I believe this has been observed by others.
> > >
> > https://community.clearlinux.org/t/sysjitter-worse-in-kernel-6-12-than-6-6/10206
> >
> > Would you be able to check if the following branch, containing multiple
> > fixes for dl-server, is still affected by the regression?

Apologies, I forgot to share the actual branch. :-/

Could you please test with

https://github.com/jlelli/linux/commits/upstream/fix-dlserver-1/

Among various other fixes, 219a63335b67 ("sched/deadline: Don't count
nr_running twice for dl_server proxy tasks") is making sure we don't
count fair tasks twice, so I am wondering if it can have an effect on
entering nohz_full.

Thanks,
Juri


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ