linux-kernel - Re: [PATCH] hrtimer: CPU and entry_time is added to a warning message in hrtimer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <20221202095544.487487-1-yamamoto.rei@jp.fujitsu.com>
Date:   Fri,  2 Dec 2022 18:55:44 +0900
From:   Rei Yamamoto <yamamoto.rei@...fujitsu.com>
To:     tglx@...utronix.de
Cc:     geert+renesas@...der.be, linux-kernel@...r.kernel.org,
        yamamoto.rei@...fujitsu.com
Subject: Re: [PATCH] hrtimer: CPU and entry_time is added to a warning message in hrtimer_interrupt()

Hi Thomas Gleixner,

Thanks for your reply.

On  Thu, 01 Dec 2022 22:30:41, Thomas Gleixner wrote:
> On Fri, Jun 24 2022 at 16:00, Rei Yamamoto wrote:
>> A warning message in hrtimer_interrupt() is output up to 5 times
>> by default, and CPU and entry_time are also shown.
> 
> This describes to some extent _what_ the patch is doing, but not the
> why.
> 
>> These changes are helpful that the function spending a lot of time is clear
>> by using ftrace:
> 
> That's a constructed case. There are multiple reasons why this can
> happen, not just because a single hrtimer callback misbehaves.
> 
>> @@ -2038,6 +2039,15 @@ static struct ctl_table kern_table[] = {
>>  		.extra1		= SYSCTL_ONE,
>>  		.extra2		= SYSCTL_INT_MAX,
>>  	},
>> +#endif
>> +#ifdef CONFIG_HIGH_RES_TIMERS
>> +	{
>> +		.procname       = "hrtimer_interrupt_warnings",
>> +		.data           = &sysctl_hrtimer_interrupt_warnings,
>> +		.maxlen         = sizeof(int),
>> +		.mode           = 0644,
>> +		.proc_handler   = proc_dointvec,
>> +	},
> 
> So this adds a new sysctl, but the changelog does not tell anything
> about it. Aside of the dubious value of this sysctl, this lacks the
> required documentation for new sysctls.
> 
>> +	/*
>> +	 * If a message is output many times, the delayed funciton
>> +	 * may be identified by resetting sysctl_hrtimer_interrupt_warnings
>> +	 * and enabling ftrace.
> 
> What has the reset of sysctl_hrtimer_interrupt_warnings to do with
> ftrace and how is that reset helpful to identify the root cause?
>
> Also repeating the printk 5 times does not add any value at all. The
> runaway detection already has logic to supress spurious events and if
> the problem persists then it can be observed by ftrace without any of
> these changes.

If the system performance degradation exists and this message is output
some times at the same time, my idea was that the reset of
sysctl_hrtimer_interrupt_warnings helps to check whether hrtimer event
is related to the degradation.
However, as you said, it can be observed by ftrace.
Therefore, I withdraw an opinion to add hrtimer_interrupt_warnings as
the sysctl parameter.

> I assume - because you did not tell so - that you try to have a
> correlation between ftrace and dmesg via the entry timestamp output,
> right?

Yes, that's right.

> That's just a half thought out debug bandaid, really.
> 
> You can provide a way better mechanism by adding a tracepoint right at
> the pr_warn_once(), which emits information for correlation right into
> the trace.
> 
> That allows you to stop the trace once the tracepoint is emitted instead
> of having to do all of this including the correlation manually.

OK, I will recreate the patch with following your idea.

Thanks.
Rei