[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <875xgdnsmr.ffs@tglx>
Date: Mon, 30 Jun 2025 17:42:04 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Wladislav Wiebe <wladislav.wiebe@...ia.com>, anna-maria@...utronix.de,
frederic@...nel.org, mingo@...nel.org
Cc: akpm@...ux-foundation.org, bigeasy@...utronix.de, peterz@...radead.org,
linux-kernel@...r.kernel.org, wladislav.wiebe@...ia.com
Subject: Re: [PATCH] irq: add support for warning on long-running IRQ handlers
On Mon, Jun 30 2025 at 14:46, Wladislav Wiebe wrote:
The subsystem prefix is 'genirq:' See
https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#patch-subject
> Introduce a new option CONFIG_IRQ_LATENCY_WARN that enables warnings when
> IRQ handlers take an unusually long time to execute.
>
> When triggered, the warning includes the CPU, IRQ number, handler address,
> name, and execution duration, for example:
>
> [CPU0] latency on IRQ[787:bad_irq_handler+0x1/0x34 [bad_irq]], took: 5 jiffies (~50 ms)
>
> To keep runtime overhead minimal, this implementation uses a jiffies-based
> timing mechanism. While coarse, it is sufficient to detect problematic IRQs.
Define sufficient. That really depends on your use case. For a real-time
system a hard interrupt handler running longer than a few microseconds
can be problematic.
So instead of adding some single purpose mechanism, can we please add
something flexible which can be used for a wide range of scenarios.
> +#ifdef CONFIG_IRQ_LATENCY_WARN
> +static inline void warn_on_irq_latency(struct irqaction *action, unsigned int irq,
> + unsigned long jiffies_start)
latency is the wrong term here. This is about the runtime, duration of
the handler.
> +{
> + unsigned long delta = jiffies - jiffies_start;
> +
> + /*
> + * Warn about long IRQ handler latency only if jiffies are reliable.
What means jiffies are reliable?
> + * The reporting condition hits only when there are at least two CPUs
> + * with active ticks.
> + * Jiffies updates are stalled on this CPU until MAX_STALLED_JIFFIES
> + * reaches and a force update happens on another CPU with active ticks.
> + */
> + if (unlikely(delta >= (MAX_STALLED_JIFFIES + CONFIG_IRQ_LATENCY_WARN_THRESHOLD))) {
> + pr_warn_ratelimited("[CPU%d] latency on IRQ[%u:%pS], took: %lu jiffies (~%u ms)\n",
> + smp_processor_id(), irq, action->handler,
> + delta, jiffies_to_msecs(delta));
> + }
> +}
> +#else
> +static inline void warn_on_irq_latency(struct irqaction *action, unsigned int irq,
> + unsigned long jiffies_start) { }
> +#endif
I'm absolutely not fond of this #ifdeffery and yet more Kconfig
knobs. Something like this should just work:
DEFINE_STATIC_KEY_FALSE(handler_duration_check);
if (static_branch_unlikely(&handler_duration_check))
ts_start = local_clock();
res = action->handler(irq, action->dev_id);
if (static_branch_unlikely(&handler_duration_check))
check_handler_duration(ts_start);
Then have a command-line option which allows the user to set a warning
threshold > 0 in microseconds. When the option is evaluated the
threshold is stored in a __ro_after_init variable and the static branch
is enabled.
Hmm?
Thanks,
tglx
Powered by blists - more mailing lists