lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <eaef4f28-5531-f8b6-1c29-7a225715012f@igalia.com>
Date: Mon, 17 Mar 2025 12:03:02 -0300
From: "Guilherme G. Piccoli" <gpiccoli@...lia.com>
To: Borislav Petkov <bp@...en8.de>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org, tglx@...utronix.de,
 mingo@...hat.com, dave.hansen@...ux.intel.com, hpa@...or.com,
 kernel@...ccoli.net, kernel-dev@...lia.com
Subject: Re: [PATCH] x86/tsc: Add debugfs entry to mark TSC as unstable after
 boot

Hi Boris! Thanks for the attention, responses below.

On 17/03/2025 11:40, Borislav Petkov wrote:
> On Wed, Feb 26, 2025 at 10:27:13AM -0300, Guilherme G. Piccoli wrote:
>> Right now, we can force the TSC to be marked as unstable through
> 
> Who's "we"?

We as in we, the Linux users. I can change to something like "Right now,
TSC can be marked as unstable" - let me know your preference =)


> 
>> boot parameter. There are debug / test cases though in which would
> 
> Which are those test cases?
>

For example, my team and I debugged recently a problem with
TSC+sched_clock: after TSC being marked as unstable by the watchdog,
sched_clock continues to use it BUT the suspend/resume TSC routines stop
being executed; for more details, please check [1]. But the thing is:
during this debug we tried forcing TSC unstable, and did that by
changing the command-line.

Problem: with that, tracing code sets its clock to global on boot time.
We were suspicious that the issue was related to local trace clock, so
we couldn't mark TSC unstable and let the trace code run with local
clock as it would, if TSC was marked as unstable by the watchdog late on
runtime.

That was one case (easily "workarounded" with trace_clock=), but in the
end, I thought that would be way better to just have this switch on
debugfs to be able to reproduce real-life TSC cases of instability,
while system runs. Hope that explains better my reasoning for adding
this debugs entry.


>> be preferable to simulate the clocksource watchdog behavior, i.e.,
>> marking TSC as unstable during the system run. Some paths might
>> change, for example: the tracing clock is auto switched to global
>> if TSC is marked as unstable on boot, but it could remain local if
>> TSC gets marked as unstable after tracing initialization.
>>
>> Hence, the proposal here is to have a simple debugfs file that
>> gets TSC marked as unstable when written.
> 
> What happens if someone marks the TSC as unstable and comes reporting to us
> that her/his machine is kaputt? And we go on a wild goose chase ...
> 

The same that happens if today someone marks it as unstable via
command-line, right? You will see that on logs, and could simply reply
that the user marked as unstable themselves, so..no bug at all!!

But let's think the other way around: what if some user marks TSC
unstable via debugfs, later on runtime, and with that, unveils a real
bug as [1] and then, we can then fix it? That would be a win heheh
Cheers,


Guilherme


[1]
https://web.git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=sched/core&id=d90c9de9de2f1712df56de6e4f7d6982d358cabe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ