linux-kernel - Re: [PATCH] hrtimer: CPU and entry_time is added to a warning message in hrtimer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87sfhyrev2.ffs@tglx>
Date:   Thu, 01 Dec 2022 22:30:41 +0100
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Rei Yamamoto <yamamoto.rei@...fujitsu.com>
Cc:     geert+renesas@...der.be, linux-kernel@...r.kernel.org,
        yamamoto.rei@...fujitsu.com
Subject: Re: [PATCH] hrtimer: CPU and entry_time is added to a warning
 message in hrtimer_interrupt()

Rei!

On Fri, Jun 24 2022 at 16:00, Rei Yamamoto wrote:
> A warning message in hrtimer_interrupt() is output up to 5 times
> by default, and CPU and entry_time are also shown.

This describes to some extent _what_ the patch is doing, but not the
why.

> These changes are helpful that the function spending a lot of time is clear
> by using ftrace:

That's a constructed case. There are multiple reasons why this can
happen, not just because a single hrtimer callback misbehaves.

> @@ -2038,6 +2039,15 @@ static struct ctl_table kern_table[] = {
>  		.extra1		= SYSCTL_ONE,
>  		.extra2		= SYSCTL_INT_MAX,
>  	},
> +#endif
> +#ifdef CONFIG_HIGH_RES_TIMERS
> +	{
> +		.procname       = "hrtimer_interrupt_warnings",
> +		.data           = &sysctl_hrtimer_interrupt_warnings,
> +		.maxlen         = sizeof(int),
> +		.mode           = 0644,
> +		.proc_handler   = proc_dointvec,
> +	},

So this adds a new sysctl, but the changelog does not tell anything
about it. Aside of the dubious value of this sysctl, this lacks the
required documentation for new sysctls.

> +	/*
> +	 * If a message is output many times, the delayed funciton
> +	 * may be identified by resetting sysctl_hrtimer_interrupt_warnings
> +	 * and enabling ftrace.

What has the reset of sysctl_hrtimer_interrupt_warnings to do with
ftrace and how is that reset helpful to identify the root cause?

Also repeating the printk 5 times does not add any value at all. The
runaway detection already has logic to supress spurious events and if
the problem persists then it can be observed by ftrace without any of
these changes.

I assume - because you did not tell so - that you try to have a
correlation between ftrace and dmesg via the entry timestamp output,
right?

That's just a half thought out debug bandaid, really.

You can provide a way better mechanism by adding a tracepoint right at
the pr_warn_once(), which emits information for correlation right into
the trace.

That allows you to stop the trace once the tracepoint is emitted instead
of having to do all of this including the correlation manually.

Hmm?

Thanks,

        tglx