linux-kernel - Re: Warning/trace in kernel/smp.c:815 smp_call_function_many

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <87jz9ff2pc.ffs@tglx>
Date: Mon, 24 Feb 2025 22:33:03 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: paulmck@...nel.org, Christian Heusel <christian@...sel.eu>
Cc: Rik van Riel <riel@...riel.com>, Neeraj Upadhyay
 <neeraj.upadhyay@...nel.org>, Ingo Molnar <mingo@...nel.org>, Zqiang
 <qiang.zhang1211@...il.com>, Thorsten Blum <thorsten.blum@...ux.dev>,
 Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
 linux-kernel@...r.kernel.org, regressions@...ts.linux.dev
Subject: Re: Warning/trace in kernel/smp.c:815 smp_call_function_many_cond

On Mon, Feb 24 2025 at 06:57, Paul E. McKenney wrote:

> On Mon, Feb 24, 2025 at 02:57:40PM +0100, Christian Heusel wrote:
>> Hello everyone,
>> 
>> I have noticed the following new warning in my dmesg output, I think I
>> have first seen this when upgrading to v6.14-rc1.
>> 
>> So far I have been unsuccessfull in bisecting it, therefore it would be
>> nice to get some input whether this is something serious or how I could
>> debug it further. I have also attached a full dmesg for more context.
>> 
>>     ------------[ cut here ]------------
>>     WARNING: CPU: 3 PID: 0 at kernel/smp.c:815 smp_call_function_many_cond+0x46b/0x4c0
>
> This happens when something invokes one of the smp_call_function()
> APIs not in task context, that is, if it is called from NMI, hard IRQ,
> or soft IRQ contexts.
>
> Which it is in this case, due to clocksource_watchdog() being invoked
> from a timer handler.  This only matters if the clocksource is being
> marked unstable, which is what you are seeing.
>
> One possible fix is to move the call to cs->mark_unstable(cs) from
> __clocksource_unstable() to clocksource_watchdog_work().  This would
> require marking the clocksource so that clocksource_watchdog_work()
> could find it.
>
> But is there a better way?

That's fine and should be trivial to do. This business is asynchronous
anyway, so it does not matter much when the unstable call is a bit
delayed.

Thanks,

        tglx