linux-kernel - Re: [PATCH v2 3/8] genirq: soft_moderation: implement fixed moderation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87bjl06yij.ffs@tglx>
Date: Tue, 18 Nov 2025 00:16:20 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: Luigi Rizzo <lrizzo@...gle.com>, Marc Zyngier <maz@...nel.org>, Luigi
 Rizzo <rizzo.unipi@...il.com>, Paolo Abeni <pabeni@...hat.com>, Andrew
 Morton <akpm@...ux-foundation.org>, Sean Christopherson
 <seanjc@...gle.com>, Jacob Pan <jacob.jun.pan@...ux.intel.com>
Cc: linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org, Bjorn Helgaas
 <bhelgaas@...gle.com>, Willem de Bruijn <willemb@...gle.com>, Luigi Rizzo
 <lrizzo@...gle.com>
Subject: Re: [PATCH v2 3/8] genirq: soft_moderation: implement fixed moderation

On Mon, Nov 17 2025 at 20:30, Thomas Gleixner wrote:
> On Sun, Nov 16 2025 at 18:28, Luigi Rizzo wrote:
>> +	ms->rounds_left--;
>> +
>> +	if (ms->rounds_left > 0) {
>> +		/* Timer still alive, just call the handlers. */
>> +		list_for_each_entry_safe(desc, next, &ms->descs, mod.ms_node) {
>> +			ms->irq_count += irq_mod_info.count_timer_calls;

I missed this gem before. How is this supposed to calculate an interrupt
rate when count_timer_calls is disabled?

Yet another magic knob to tweak something which works by chance and not
by design.

TBH. This whole thing should be put into the 'ugly code museum' for
educational purposes and deterrence. It wants to be rewritten from
scratch with a proper design and a structured understandable approach.

This polish the Google PoC hackery to death will go nowhere. It's just a
ginormous waste of time. Not that I care about the time you waste with
that, but I pretty much care about mine.

That said, start over from scratch and take the feedback into account so
you can address the substantial problems I pointed out (CPU hotplug,
concurrency, life time management, state consistency, affinity changes)
in the design and not after the fact.

First of all please find some other wording than moderation. That's just
a randomly diced word without real meaning. Pick something which
describes what this infrastructure actually does: Adaptive polling, no?

There are a couple of other fundamental questions to answer upfront:

   1) Is this throttle everything on a CPU the proper approach?

      To me this does not make sense. The CPU hogging network adapter or
      disk drive has no business to delay low frequency interrupts,
      which might be important, just because.

      Making this a per interrupt line property allows to solve a few
      other issues trivially like the integration into that posted MSI
      muck.

      It also reduces the amount of descriptors to be polled in the
      timer interrupt.

   2) Shouldn't the interrupt source be masked at the device level once
      an interrupt is switched into polling mode?

      Masking it at the device level (without touching disabled state)
      is definitely a sensible thing to do. It keeps state consistent
      and again allows trivial integration of that posted MSI stuff
      without insane hacks all over the place.

   3) Does a pure interrupt rate based scheme make sense?

      Definitely not in the way it's implemented. Why?

      Simply because once you switched to polling mode there is no real
      information anymore as you fail to take the return value of the
      handler into account. So unless your magic knob is 0 every polled
      interrupt is accounted for whether it actually handles an
      interrupt or not.

      But if your magic knob is 0 then this purely relies on irqtime
      accounting, which includes the timer interrupt as an accumulative
      measure.

      IOW, "works" by some definition of works after adding enough magic
      knobs to make it "work" under certain circumstances. "Works for
      Google" is not a good argument.

      That's unmaintainable and unusable. No amount of magic command
      line examples will fix that because the state space of your knobs
      is way too big to be useful and comprehensible.

Add all the questions which pop up when you really sit down and do a
proper from scratch design of this.

Thanks,

        tglx