[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAMOZA0+K73YbPqq_vTS2sMkbV-0Fh5GSCt3ABfReV3DYk1CO2g@mail.gmail.com>
Date: Tue, 18 Nov 2025 11:09:10 +0100
From: Luigi Rizzo <lrizzo@...gle.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Marc Zyngier <maz@...nel.org>, Luigi Rizzo <rizzo.unipi@...il.com>,
Paolo Abeni <pabeni@...hat.com>, Andrew Morton <akpm@...ux-foundation.org>,
Sean Christopherson <seanjc@...gle.com>, Jacob Pan <jacob.jun.pan@...ux.intel.com>,
linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
Bjorn Helgaas <bhelgaas@...gle.com>, Willem de Bruijn <willemb@...gle.com>
Subject: Re: [PATCH v2 3/8] genirq: soft_moderation: implement fixed moderation
On Tue, Nov 18, 2025 at 9:34 AM Thomas Gleixner <tglx@...utronix.de> wrote:
>
> On Tue, Nov 18 2025 at 00:59, Luigi Rizzo wrote:
> > On Tue, Nov 18, 2025 at 12:16 AM Thomas Gleixner <tglx@...utronix.de> wrote:
> >> There are a couple of other fundamental questions to answer upfront:
> >>
> >> 1) Is this throttle everything on a CPU the proper approach?
> >>
> >> To me this does not make sense. The CPU hogging network adapter or
> >> disk drive has no business to delay low frequency interrupts,
> >> which might be important, just because.
> >
> > while there is some shared fate, a low frequency source (with interrupts
> > more than the adaptive_delay apart) on the same CPU as a high frequency
> > source, will rarely if ever see any additional delay:
> > the first interrupt from a source is always served right away,
> > there is a high chance that the timer fires and the source
> > is re-enabled before the next interrupt from the low frequency source.
>
> I understand that from a practical point of view it might not make a real
> difference, but when you look at it conceptually, then the interrupt
> which causes the system to slow down is the one you want to switch over
> into polling mode. All others are harmless as they do not contribute to
> the overall problem in a significant enough way.
(I appreciate the time you are dedicating to this thread)
Fully agree. The tradeoff is that the rate accounting state
(#interrupts in the last interval, a timestamp, mod_ns, sleep_ns)
now would have to go into the irqdesc, and the extra layer
of per-CPU aggregation is still needed to avoid hitting too often on
the shared state.
I also want to reiterate that "polling mode" is not the core contribution
of this patch series. There is limited polling only when timer_rounds>0,
which is not what I envision to use, and will go away because
as you showed it does not handle correctly the teardown path.
> As a side effect of that approach the posted MSI integration then mostly
> falls into place especially when combined with immediate masking.
> Immediate masking is not a problem at all because in reality the high
> frequency interrupt will be masked immediately on the next event (a few
> microseconds later) anyway.
This again has pros and cons. The posted MSI feature
helps only when there are N>1 high rate sources
hitting the same CPU, and in that (real) case having to
mask N sources one by one, rather than just not rearming
the posted_msi interrupt, means an N-fold increase in
the interrupt rate for a given moderation delay.
Note that even under load, actual moderation delays
are typically as low as 20-40us, which are practically
unnoticeable by low rate devices (moderation does
not affect timers or system interrupts, and one
has always the option to move the latency sensitive,
low rate source to a different CPU where it would also
benefit from the jitter induced by the heavy hitters).
cheers
luigi
Powered by blists - more mailing lists