[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <875xbee47c.ffs@tglx>
Date: Thu, 13 Nov 2025 09:17:59 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: Luigi Rizzo <lrizzo@...gle.com>, Marc Zyngier <maz@...nel.org>, Luigi
Rizzo <rizzo.unipi@...il.com>, Paolo Abeni <pabeni@...hat.com>, Andrew
Morton <akpm@...ux-foundation.org>, Sean Christopherson
<seanjc@...gle.com>, Jacob Pan <jacob.jun.pan@...ux.intel.com>
Cc: linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org, Bjorn Helgaas
<bhelgaas@...gle.com>, Willem de Bruijn <willemb@...gle.com>, Luigi Rizzo
<lrizzo@...gle.com>
Subject: Re: [PATCH 1/6] genirq: platform wide interrupt moderation:
Documentation, Kconfig, irq_desc
On Wed, Nov 12 2025 at 19:24, Luigi Rizzo wrote:
> Platform wide software interrupt moderation ("soft_moderation" in this
> patch series) specifically addresses a limitation of platforms from many
"in this patch series" becomes meaningless when this is actually merged
into git.
> vendors whose I/O performance drops significantly when the total rate of
> MSI-X interrupts is too high (e.g 1-3M intr/s depending on the platform).
>
> Conventional interrupt moderation operates separately on each source,
> hence the configuration should target the worst case. On large servers
> with hundreds of interrupt sources, keeping the total rate bounded would
> require delays of 100-200us; and adaptive moderation would have to reach
> those delays with as little as 10K intr/s per source. These values are
> unacceptable for RPC or transactional workloads.
>
> To address this problem, this code measures efficiently the total and
> per-CPU interrupt rates, so that individual moderation delays can be
> adjusted based on actual global and local load. This way, the system
> controls both global interrupt rates and individual CPU load, and
> tunes delays so they are normally 0 or very small except during actual
> local/global overload.
>
> Configuration is easy and robust. System administrators specify the
> maximum targets (moderation delay; interrupt rate; and fraction of time
> spent in hardirq), and per-CPU control loops adjust actual delays to try
> and keep metrics within the bounds.
>
> There is no need for exact targets, because the system is adaptive; the
> defaults delay_us=100, target_irq_rate=1000000, hardirq_frac=70 intr/s,
> are good almost everywhere.
>
> The system does not rely on any special hardware feature except from
> MSI-X Pending Bit Array (PBA), a mandatory component of MSI-X
>
> Boot defaults are set via module parameters (/sys/module/irq_moderation
> and /sys/module/${DRIVER}) or at runtime via /proc/irq/moderation, which
> is also used to export statistics. Moderation on individual irq can be
> turned on/off via /proc/irq/NN/moderation .
>
> The system does not rely on any special hardware feature except from
> MSI-X Pending Bit Array (PBA), a mandatory component of MSI-X
You said that 8 lines above already, but I can't see where the PBA
dependency actually is.
> This initial patch adds Documentation, Kconfig option, two fields in
git grep 'This patch' Documentation/process/
> struct irq_desc, and prototypes in include/linux/interrupt.h
Please do not describe trivial implementation details in the change log.
> No functional impact.
>
> Enabling the option will just extend struct irq_desc with two fields.
>
> CONFIG_SOFT_IRQ_MODERATION=y
> ---
Clearly lacks a 'Signed-off-by' as all the other patches do.
The patch submission rules are well documented.
https://www.kernel.org/doc/html/latest/process/submitting-patches.html
and please also read and follow:
https://www.kernel.org/doc/html/latest/process/maintainer-tip.html
> +#ifdef CONFIG_IRQ_SOFT_MODERATION
> +
> +void irq_moderation_percpu_init(void);
> +void irq_moderation_init_fields(struct irq_desc *desc);
> +/* add/remove /proc/irq/NN/soft_moderation */
> +void irq_moderation_procfs_entry(struct irq_desc *desc, umode_t umode);
> +
> +#else /* empty stubs to avoid conditional compilation */
The canonical format is:
#else /* CONFIG_IRQ_SOFT_MODERATION */
...
#endif /* !CONFIG_IRQ_SOFT_MODERATION */
But that aside, what's the point of adding these prototypes and stubs
without an implementation?
> /*
> * We want to know which function is an entrypoint of a hardirq or a softirq.
> */
> diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h
> index fd091c35d5721..4eb05bc456abe 100644
> --- a/include/linux/irqdesc.h
> +++ b/include/linux/irqdesc.h
> @@ -112,6 +112,11 @@ struct irq_desc {
> #endif
> struct mutex request_mutex;
> int parent_irq;
> +#ifdef CONFIG_IRQ_SOFT_MODERATION
> + /* mode: 0: off, 1: disable_irq_nosync() */
> + u8 moderation_mode; /* written in procfs, read on irq */
> + struct list_head ms_node; /* per-CPU list of moderated irq_desc */
Don't use tail comments and document the members in the exisiting struct
documentation. It's not that hard to keep something consistent.
Again. What's the point of adding this without usage?
You can also avoid the #ifdeffery by doing:
#ifdef CONFIG_IRQ_SOFT_MODERATION
struct irq_desc_mod {
....
....
};
#else
struct irq_desc_mod { };
#endif
Powered by blists - more mailing lists