linux-kernel - Re: [PATCH] x86/tsc: Add debugfs entry to mark TSC as unstable after boot

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <eaef4f28-5531-f8b6-1c29-7a225715012f@igalia.com>
Date: Mon, 17 Mar 2025 12:03:02 -0300
From: "Guilherme G. Piccoli" <gpiccoli@...lia.com>
To: Borislav Petkov <bp@...en8.de>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org, tglx@...utronix.de,
 mingo@...hat.com, dave.hansen@...ux.intel.com, hpa@...or.com,
 kernel@...ccoli.net, kernel-dev@...lia.com
Subject: Re: [PATCH] x86/tsc: Add debugfs entry to mark TSC as unstable after
 boot

Hi Boris! Thanks for the attention, responses below.

On 17/03/2025 11:40, Borislav Petkov wrote:
> On Wed, Feb 26, 2025 at 10:27:13AM -0300, Guilherme G. Piccoli wrote:
>> Right now, we can force the TSC to be marked as unstable through
> 
> Who's "we"?

We as in we, the Linux users. I can change to something like "Right now,
TSC can be marked as unstable" - let me know your preference =)

> 
>> boot parameter. There are debug / test cases though in which would
> 
> Which are those test cases?
>

For example, my team and I debugged recently a problem with
TSC+sched_clock: after TSC being marked as unstable by the watchdog,
sched_clock continues to use it BUT the suspend/resume TSC routines stop
being executed; for more details, please check [1]. But the thing is:
during this debug we tried forcing TSC unstable, and did that by
changing the command-line.

Problem: with that, tracing code sets its clock to global on boot time.
We were suspicious that the issue was related to local trace clock, so
we couldn't mark TSC unstable and let the trace code run with local
clock as it would, if TSC was marked as unstable by the watchdog late on
runtime.

That was one case (easily "workarounded" with trace_clock=), but in the
end, I thought that would be way better to just have this switch on
debugfs to be able to reproduce real-life TSC cases of instability,
while system runs. Hope that explains better my reasoning for adding
this debugs entry.

>> be preferable to simulate the clocksource watchdog behavior, i.e.,
>> marking TSC as unstable during the system run. Some paths might
>> change, for example: the tracing clock is auto switched to global
>> if TSC is marked as unstable on boot, but it could remain local if
>> TSC gets marked as unstable after tracing initialization.
>>
>> Hence, the proposal here is to have a simple debugfs file that
>> gets TSC marked as unstable when written.
> 
> What happens if someone marks the TSC as unstable and comes reporting to us
> that her/his machine is kaputt? And we go on a wild goose chase ...
> 

The same that happens if today someone marks it as unstable via
command-line, right? You will see that on logs, and could simply reply
that the user marked as unstable themselves, so..no bug at all!!

But let's think the other way around: what if some user marks TSC
unstable via debugfs, later on runtime, and with that, unveils a real
bug as [1] and then, we can then fix it? That would be a win heheh
Cheers,

Guilherme

[1]
https://web.git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=sched/core&id=d90c9de9de2f1712df56de6e4f7d6982d358cabe