linux-kernel - Re: [PATCH] x86/tsc: Add debugfs entry to mark TSC as unstable after boot

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87iko213qo.ffs@tglx>
Date: Fri, 21 Mar 2025 22:19:11 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: "Guilherme G. Piccoli" <gpiccoli@...lia.com>, "H. Peter Anvin"
 <hpa@...or.com>, bp@...en8.de
Cc: x86@...nel.org, linux-kernel@...r.kernel.org, mingo@...hat.com,
 dave.hansen@...ux.intel.com, kernel@...ccoli.net, kernel-dev@...lia.com
Subject: Re: [PATCH] x86/tsc: Add debugfs entry to mark TSC as unstable
 after boot

On Fri, Mar 21 2025 at 16:26, Guilherme G. Piccoli wrote:
> On 17/03/2025 15:42, H. Peter Anvin wrote:
>> To be honest I don't think this belongs in debugfs; rather it belongs in sysfs.

No.

>> Debugfs should not be necessarily in serious production systems – it
>> is way too large of an attack surface, which is a very good reason
>> why it is its own filesystem – but if this is a real issue on
>> hardware then it may be needed.

There is ZERO reason to do that on a production system.

If the in kernel detection does not work, then switching it over after
someone detected the problem five hours after the fact does not help at
all.

The admin can force the TSC to be removed from timekeeping, which is the
really crucial part, already today by changing the clocksource via sysfs.

> In other words, we have 2 options in my understanding:
>
> (a) Drop it;
>
> (b) Re-implement using sysfs entry instead of debugfs;

Neither (a) nor (b) nor the proposed implementation.

There is actually a good reason why a debug/validation mechanism of some
sort makes sense, i.e. testing:

  1) The hardware, which exposed these issues frequently is starting to
     get into museum or junkyard state, which reduces the test base
     significantly.

  2) Modern hardware, which exposes this issue in large fleets
     occasionally due to aging and misdirected neutrons, is not really a
     good testbed either.

So we have no real test coverage for something, which can be crucial in
the actual failure case.

Sure, it could be argued that this can be implemented in qemu, but that's
fundamentally the wrong approach.

This is something which must be easily available to developers and CI
and not require to have a special setup with debug nonsense enabled in
some external tool.

The proposed implementation is just an ad hoc band aid as well. Why?

  1) It has zero relation to the actual failure detection code paths.

  2) It covers only a small part of the problem space. On all modern
     systems, which have TSC_ADJUST the clocksource watchdog is disabled
     and just asynchronously invoking TSC unstable is a hack which only
     tests the unstable logic.

So I rather want to see a more complete solution, which

  1) lets the clocksource watchdog logic fail the test

  2) lets the TSC sync (including TSC_ADJUST) logic on CPU hotplug fail

  3) tweaks the TSC_ADJUST register and validates that the detection and
     mitigation logic on systems w/o clocksource watchdog works
     correctly.

Ideally that's a kunit test for CI integration plus a debugfs interface
for developers, which comes with a related selftest.

Thanks,

        tglx