[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87h5us744v.ffs@tglx>
Date: Mon, 17 Nov 2025 22:14:56 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: Luigi Rizzo <lrizzo@...gle.com>, Marc Zyngier <maz@...nel.org>, Luigi
Rizzo <rizzo.unipi@...il.com>, Paolo Abeni <pabeni@...hat.com>, Andrew
Morton <akpm@...ux-foundation.org>, Sean Christopherson
<seanjc@...gle.com>, Jacob Pan <jacob.jun.pan@...ux.intel.com>
Cc: linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org, Bjorn Helgaas
<bhelgaas@...gle.com>, Willem de Bruijn <willemb@...gle.com>, Luigi Rizzo
<lrizzo@...gle.com>
Subject: Re: [PATCH v2 5/8] x86/irq: soft_moderation: add support for
posted_msi (intel)
On Sun, Nov 16 2025 at 18:28, Luigi Rizzo wrote:
> On recent Intel CPUs, kernels compiled with CONFIG_X86_POSTED_MSI=y,
> and the boot option "intremap=posted_msi", all MSI interrupts
> that hit a CPU issue a single POSTED_MSI interrupt processed by
> sysvec_posted_msi_notification() instead of having separate interrupts.
>
> This change adds soft moderation hooks to the above handler.
'This change' is not any better than 'This patch'.
> Soft moderation on posted_msi does not require per-source enable,
> irq_moderation.delay_us > 0 suffices.
>
> To test it, run a kernel with the above options and enable moderation by
> setting delay_us > 0. The column "from_msi" in /proc/irq/soft_moderation
> will show a non-zero value.
Impressive. But it would be more impressive and useful to actualy have
an explanation why this is required and what the benefits are.
> CFLAGS_head32.o := -fno-stack-protector
> CFLAGS_head64.o := -fno-stack-protector
> -CFLAGS_irq.o := -I $(src)/../include/asm/trace
> +CFLAGS_irq.o := -I $(src)/../include/asm/trace -I $(srctree)/kernel/irq
Not going to happen. Architecture code has no business to tap into
generic core infrastructure.
> --- a/kernel/irq/irq_moderation.c
> +++ b/kernel/irq/irq_moderation.c
> @@ -11,6 +11,13 @@
> #include "internals.h"
> #include "irq_moderation.h"
>
> +#ifdef CONFIG_X86
Architecture specific muck has absolutely no place in generic code.
> +#include <asm/apic.h>
> +#include <asm/irq_remapping.h>
> +#else
> +static inline bool posted_msi_supported(void) { return false; }
> +#endif
> +
> /*
> * Platform-wide software interrupt moderation.
> *
> @@ -32,6 +39,10 @@
> * moderation on that CPU/irq. If so, calls disable_irq_nosync() and starts
> * an hrtimer with appropriate delay.
> *
> + * - Intel only: using "intremap=posted_msi", all the above is done in
> + * sysvec_posted_msi_notification(). In this case all host device interrupts
> + * are subject to moderation.
> + *
> * - the timer callback calls enable_irq() for all disabled interrupts on that
> * CPU. That in turn will generate interrupts if there are pending events.
> *
> @@ -230,6 +241,17 @@ static enum hrtimer_restart timer_cb(struct hrtimer *timer)
>
> ms->rounds_left--;
>
> +#ifdef CONFIG_X86_POSTED_MSI
> + if (ms->kick_posted_msi) {
> + if (ms->rounds_left == 0)
> + ms->kick_posted_msi = false;
> + /* Next call will be from timer, count it conditionally. */
> + ms->dont_count = !irq_mod_info.count_timer_calls;
TBH, this is one of the worst hacks I've seen in a long time. It's
admittedly creative, but that's unmaintainable and the worst precedent
to open the flood gates for random architecture hacks in generic core
code.
If you want this to work for that posted MSI muck, then this needs
proper integration of those posted vectors into the interrupt core and a
sane mechanism to do the accounting.
> @@ -476,6 +500,18 @@ static ssize_t mode_write(struct file *f, const char __user *buf, size_t count,
> ret = kstrtobool(cmd, &enable);
> if (!ret)
> ret = set_moderation_mode(desc, enable);
> + if (ret) {
> + /* extra helpers for prodkernel */
> + if (cmd[count - 1] == '\n')
> + cmd[count - 1] = '\0';
> + ret = 0;
> + if (!strcmp(cmd, "managed"))
> + irqd_set(&desc->irq_data, IRQD_AFFINITY_MANAGED);
> + else if (!strcmp(cmd, "unmanaged"))
> + irqd_clear(&desc->irq_data, IRQD_AFFINITY_MANAGED);
What the heck? Can you spare me the exposure to random google internal
garbage? My eyes are bleeding already enough.
Thanks,
tglx
Powered by blists - more mailing lists