lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87h5us744v.ffs@tglx>
Date: Mon, 17 Nov 2025 22:14:56 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: Luigi Rizzo <lrizzo@...gle.com>, Marc Zyngier <maz@...nel.org>, Luigi
 Rizzo <rizzo.unipi@...il.com>, Paolo Abeni <pabeni@...hat.com>, Andrew
 Morton <akpm@...ux-foundation.org>, Sean Christopherson
 <seanjc@...gle.com>, Jacob Pan <jacob.jun.pan@...ux.intel.com>
Cc: linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org, Bjorn Helgaas
 <bhelgaas@...gle.com>, Willem de Bruijn <willemb@...gle.com>, Luigi Rizzo
 <lrizzo@...gle.com>
Subject: Re: [PATCH v2 5/8] x86/irq: soft_moderation: add support for
 posted_msi (intel)

On Sun, Nov 16 2025 at 18:28, Luigi Rizzo wrote:
> On recent Intel CPUs, kernels compiled with CONFIG_X86_POSTED_MSI=y,
> and the boot option "intremap=posted_msi", all MSI interrupts
> that hit a CPU issue a single POSTED_MSI interrupt processed by
> sysvec_posted_msi_notification() instead of having separate interrupts.
>
> This change adds soft moderation hooks to the above handler.

'This change' is not any better than 'This patch'.

> Soft moderation on posted_msi does not require per-source enable,
> irq_moderation.delay_us > 0 suffices.
>
> To test it, run a kernel with the above options and enable moderation by
> setting delay_us > 0.  The column "from_msi" in /proc/irq/soft_moderation
> will show a non-zero value.

Impressive. But it would be more impressive and useful to actualy have
an explanation why this is required and what the benefits are.

>  CFLAGS_head32.o := -fno-stack-protector
>  CFLAGS_head64.o := -fno-stack-protector
> -CFLAGS_irq.o := -I $(src)/../include/asm/trace
> +CFLAGS_irq.o := -I $(src)/../include/asm/trace -I $(srctree)/kernel/irq

Not going to happen. Architecture code has no business to tap into
generic core infrastructure.

> --- a/kernel/irq/irq_moderation.c
> +++ b/kernel/irq/irq_moderation.c
> @@ -11,6 +11,13 @@
>  #include "internals.h"
>  #include "irq_moderation.h"

>  
> +#ifdef CONFIG_X86

Architecture specific muck has absolutely no place in generic code.

> +#include <asm/apic.h>
> +#include <asm/irq_remapping.h>
> +#else
> +static inline bool posted_msi_supported(void) { return false; }
> +#endif
> +
>  /*
>   * Platform-wide software interrupt moderation.
>   *
> @@ -32,6 +39,10 @@
>   *   moderation on that CPU/irq. If so, calls disable_irq_nosync() and starts
>   *   an hrtimer with appropriate delay.
>   *
> + * - Intel only: using "intremap=posted_msi", all the above is done in
> + *   sysvec_posted_msi_notification(). In this case all host device interrupts
> + *   are subject to moderation.
> + *
>   * - the timer callback calls enable_irq() for all disabled interrupts on that
>   *   CPU. That in turn will generate interrupts if there are pending events.
>   *
> @@ -230,6 +241,17 @@ static enum hrtimer_restart timer_cb(struct hrtimer *timer)
>  
>  	ms->rounds_left--;
>  
> +#ifdef CONFIG_X86_POSTED_MSI
> +	if (ms->kick_posted_msi) {
> +		if (ms->rounds_left == 0)
> +			ms->kick_posted_msi = false;
> +		/* Next call will be from timer, count it conditionally. */
> +		ms->dont_count = !irq_mod_info.count_timer_calls;

TBH, this is one of the worst hacks I've seen in a long time. It's
admittedly creative, but that's unmaintainable and the worst precedent
to open the flood gates for random architecture hacks in generic core
code.

If you want this to work for that posted MSI muck, then this needs
proper integration of those posted vectors into the interrupt core and a
sane mechanism to do the accounting.

> @@ -476,6 +500,18 @@ static ssize_t mode_write(struct file *f, const char __user *buf, size_t count,
>  	ret = kstrtobool(cmd, &enable);
>  	if (!ret)
>  		ret = set_moderation_mode(desc, enable);
> +	if (ret) {
> +		/* extra helpers for prodkernel */
> +		if (cmd[count - 1] == '\n')
> +			cmd[count - 1] = '\0';
> +		ret = 0;
> +		if (!strcmp(cmd, "managed"))
> +			irqd_set(&desc->irq_data, IRQD_AFFINITY_MANAGED);
> +		else if (!strcmp(cmd, "unmanaged"))
> +			irqd_clear(&desc->irq_data, IRQD_AFFINITY_MANAGED);

What the heck? Can you spare me the exposure to random google internal
garbage? My eyes are bleeding already enough.

Thanks,

        tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ