[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAKJHwtOdiFTenF=zCL7_8c148Qs37r53k9uAKURLjq1JFJGeXg@mail.gmail.com>
Date: Fri, 1 Aug 2025 10:28:54 -0500
From: David Haufe <dhaufe@...plextrading.com>
To: Juri Lelli <juri.lelli@...hat.com>
Cc: linux-kernel@...r.kernel.org
Subject: Re: Kernel 6.14.11 dl_server_timer(...) causing IPI/Function Call
Interrupts on isolcpu/nohz_full cores, performance regression
I am sorry, but we cannot get this branch to boot on our hardware.
Looking through the code of the branch, it will not address the issue.
I believe the issue is more fundamental. In
fair.c->enqueue_task_fair(), dl_server_start() is called when the
single fair/SCHED_OTHER task is added to the isolcpu/nohz_full core.
The check here is simply checking if there is 1 or more process and
kicks off the dl_server_start() and the housekeeping timer in
start_dl_timer(). Once this timer is running, it will invoke
dl_server_timer() continuously. This timer calls __enqueue_dl_entity()
and then inc_dl_tasks(). inc_dl_tasks() increments
dl_rq->dl_nr_running++ and invokes add_nr_running(). This code will
eventually call the sched_can_stop_tick() function but
rq->dl.dl_nr_running now != 0, so this function will always return
false. Something needs to be done to prevent this timer from running
in the first place, or maybe have some checks around single
"fair/SCHED_OTHER/etc" process running on an isolcpu/nohz_full core
which prevents the need for the deadline code to run for the core.
On Fri, Aug 1, 2025 at 4:06 AM Juri Lelli <juri.lelli@...hat.com> wrote:
>
> Hi,
>
> On 31/07/25 12:48, David Haufe wrote:
> > Kernel 6.16 shows the issue. /kernel/sched/fair.c calls dl_server_start()
> > and there is no assessment prior to that point or later of the
> > isolcpu/nohz_full+single-process condition of the core. Same function_graph
> > trace generated. Code is the same at tip+sched/core.
> >
> > On Thu, Jul 31, 2025 at 2:02 AM Juri Lelli <juri.lelli@...hat.com> wrote:
> >
> > > Hello,
> > >
> > > Thanks for the report.
> > >
> > > On 30/07/25 11:51, David Haufe wrote:
> > > > [1.] Kernel 6.14.11 dl_server_timer(...) causing IPI/Function Call
> > > > Interrupts on isolcpu/nohz_full cores, performance regression
> > > > [2.] The code for dl_server_timer is causing new IPI/Function Call
> > > > Interrupts to fire on isolcpu/nohz_full cores which previously had no
> > > > interrupts. When there is a single, SCHED_OTHER process running on an
> > > > isolcpu/nohz_full core, dl_server_timer executes on a housekeeping
> > > > core. This ultimately invokes add_nr_running() and
> > > > sched_update_tick_dependency() and finally tick_nohz_dep_set_cpu().
> > > > Setting the single process running on an isolcpu/nohz_full core to
> > > > FIFO (rt priority) prevents this new interrupt, as it is not seen as a
> > > > fair schedule process anymore. Having to use rt priority is
> > > > unnecessary and a regression to prior kernels. Kernel function_graph
> > > > trace below showing core 0 (housekeeping) sending the IPI to core 19
> > > > (nohz_full, isolcpu, rcu_nocb_poll) which is running a single
> > > > SCHED_OTHER process. I believe this has been observed by others.
> > > >
> > > https://community.clearlinux.org/t/sysjitter-worse-in-kernel-6-12-than-6-6/10206
> > >
> > > Would you be able to check if the following branch, containing multiple
> > > fixes for dl-server, is still affected by the regression?
>
> Apologies, I forgot to share the actual branch. :-/
>
> Could you please test with
>
> https://github.com/jlelli/linux/commits/upstream/fix-dlserver-1/
>
> Among various other fixes, 219a63335b67 ("sched/deadline: Don't count
> nr_running twice for dl_server proxy tasks") is making sure we don't
> count fair tasks twice, so I am wondering if it can have an effect on
> entering nohz_full.
>
> Thanks,
> Juri
>
--
DISCLAIMER: NOTICE REGARDING PRIVACY AND CONFIDENTIALITY
The information
contained in and/or accompanying this communication is intended only for
use by the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient of
this e-mail, you are hereby notified that any dissemination, distribution
or copying of this information, and any attachments thereto, is strictly
prohibited. If you have received this e-mail in error, please immediately
notify the sender and permanently delete the original and any copy of any
e-mail and any printout thereof. Electronic transmissions cannot be
guaranteed to be secure or error-free. The sender therefore does not accept
liability for any errors or omissions in the contents of this message which
arise as a result of e-mail transmission. Simplex Trading, LLC and its
affiliates reserves the right to intercept, monitor, and retain electronic
communications to and from its system as permitted by law. Simplex Trading,
LLC is a registered Broker Dealer with CBOE and a Member of SIPC.
Powered by blists - more mailing lists