linux-kernel - Re: [PATCH v2 3/8] genirq: soft_moderation: implement fixed moderation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87pl9fmhe5.ffs@tglx>
Date: Tue, 18 Nov 2025 17:31:46 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: Luigi Rizzo <lrizzo@...gle.com>
Cc: Marc Zyngier <maz@...nel.org>, Luigi Rizzo <rizzo.unipi@...il.com>,
 Paolo Abeni <pabeni@...hat.com>, Andrew Morton
 <akpm@...ux-foundation.org>, Sean Christopherson <seanjc@...gle.com>,
 linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org, Bjorn Helgaas
 <bhelgaas@...gle.com>, Willem de Bruijn <willemb@...gle.com>
Subject: Re: [PATCH v2 3/8] genirq: soft_moderation: implement fixed moderation

On Tue, Nov 18 2025 at 11:09, Luigi Rizzo wrote:
> On Tue, Nov 18, 2025 at 9:34 AM Thomas Gleixner <tglx@...utronix.de> wrote:
>> I understand that from a practical point of view it might not make a real
>> difference, but when you look at it conceptually, then the interrupt
>> which causes the system to slow down is the one you want to switch over
>> into polling mode. All others are harmless as they do not contribute to
>> the overall problem in a significant enough way.
>
> (I appreciate the time you are dedicating to this thread)

It's hopefully saving me time and nerves later :)

> Fully agree. The tradeoff is that the rate accounting state
> (#interrupts in the last interval, a timestamp, mod_ns, sleep_ns)
> now would have to go into the irqdesc, and the extra layer
> of per-CPU aggregation is still needed to avoid hitting too often on
> the shared state.
>
> I also want to reiterate that "polling mode" is not the core contribution
> of this patch series. There is limited polling only when timer_rounds>0,
> which is not what I envision to use, and will go away because
> as you showed it does not handle correctly the teardown path.

Ok.

>>  As a side effect of that approach the posted MSI integration then mostly
>> falls into place especially when combined with immediate masking.
>> Immediate masking is not a problem at all because in reality the high
>> frequency interrupt will be masked immediately on the next event (a few
>> microseconds later) anyway.
>
> This again has pros and cons. The posted MSI feature
> helps only when there are N>1  high rate sources
> hitting the same CPU, and in that (real) case having to
> mask N sources one by one, rather than just not rearming
> the posted_msi interrupt, means an N-fold increase in
> the interrupt rate for a given moderation delay.

That's kinda true for the per interrupt accounting, but if you look at
it from a per CPU accounting perspective then you still can handle them
individually and mask them when they arrive within or trigger a delay
window. That means:

   1) One of the PIR aggregated device interrupts triggers the delay

   2) If there are other bits active in that same posted handler
      invocation, then they will be flipped over too

   3) Devices which have not been covered by the initial switch, will
      either be not affected at all or can raise exactly _one_ interrupt
      which flips them over.

This scheme makes me way more comfortable because:

   1) The state is always consistent.

      If masked then the PIR rearm is not causing any harm because the
      device can't send an interrupt anymore.

      On unmask there is no way to screw up vs. rearming and eventually
      losing an interrupt.

      I haven't analyzed your scheme in detail, but the PIR mechanism is
      fickle and I have very little confidence that the way you
      implemented it is correct under all circumstances.

   2) It requires exactly zero lines of x86-intelism code.

   3) It keeps the mechanism exactly the same for PIR and non-PIR mode
      as you need the 'mask' on mode transition anyway to solve the
      disable/enable_irq() state inconsistency.

Please focus on making this simple and correct in the first place.

If there is still a really compelling reason to treat PIR differently,
then this can be done as a separate step once the initial dust has
settled.

Thanks,

        tglx