linux-kernel - Re: [PATCH v7] sched/clock: Avoid false sharing for sched_clock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3b69443c-f82c-4b73-8384-2a039d730213@intel.com>
Date: Wed, 28 Jan 2026 10:19:21 +0800
From: "Guo, Wangyang" <wangyang.guo@...el.com>
To: Shrikanth Hegde <sshegde@...ux.ibm.com>,
 K Prateek Nayak <kprateek.nayak@....com>
Cc: linux-kernel@...r.kernel.org, Benjamin Lei <benjamin.lei@...el.com>,
 Tim Chen <tim.c.chen@...ux.intel.com>, Tianyou Li <tianyou.li@...el.com>,
 Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
 Juri Lelli <juri.lelli@...hat.com>,
 Vincent Guittot <vincent.guittot@...aro.org>,
 Dietmar Eggemann <dietmar.eggemann@....com>,
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
 Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>
Subject: Re: [PATCH v7] sched/clock: Avoid false sharing for
 sched_clock_irqtime

On 1/27/2026 7:04 PM, Shrikanth Hegde wrote:
> Hi Wangyang, Prateek.
> 
> On 1/27/26 12:55 PM, Wangyang Guo wrote:
>> Read-mostly sched_clock_irqtime may share the same cacheline with
>> frequently updated nohz struct. Make it as static_key to avoid
>> false sharing issue.
>>
>> The only user of disable_sched_clock_irqtime()
>> is tsc_.*mark_unstable() which may be invoked under atomic context
>> and require a workqueue to disable static_key. But both of them
>> calls clear_sched_clock_stable() just before doing
>> disable_sched_clock_irqtime(). We can reuse
>> "sched_clock_work" to also disable sched_clock_irqtime().
>>
>> One additional case need to handle is if the tsc is marked unstable
>> before late_initcall() phase, sched_clock_work will not be invoked
>> and sched_clock_irqtime will stay enabled although clock is unstable:
>>    tsc_init()
>>      enable_sched_clock_irqtime() # irqtime accounting is enabled here
>>      ...
>>      if (unsynchronized_tsc()) # true
>>        mark_tsc_unstable()
>>          clear_sched_clock_stable()
>>            __sched_clock_stable_early = 0;
>>            ...
>>            if (static_key_count(&sched_clock_running.key) == 2)
>>              # Only happens at sched_clock_init_late()
>>              __clear_sched_clock_stable(); # Never executed
>>    ...
>>
>>    # late_initcall() phase
>>    sched_clock_init_late()
>>      if (__sched_clock_stable_early) # Already false
>>        __set_sched_clock_stable(); # sched_clock is never marked stable
>>    # TSC unstable, but sched_clock_work won't run to disable irqtime
>>
>> So we need to disable_sched_clock_irqtime() in sched_clock_init_late()
>> if clock is unstable.
>>
> 
> Do you this as a valid case? have you tested with CONFIG_PARAVIRT?
> 
> Lets say you have a non native sched clock such as kvm_sched_clock_read.
> tsc_init -> sets enable_sched_clock_irqtime()
>           ->mark_tsc_unstable -> if using_native_sched_clock -> 
> clear_sched_clock_stable
> 
> In this case, since clear_sched_clock_stable won't be called you may not 
> disable the
> sched clock irqtime since __sched_clock_stable_early is reset only in 
> clear_sched_clock_stable

For hypervisor, I see this path may call clear_sched_clock_stable when 
clock is unstable at init:

   kvm_init_platform() ->
   kvmclock_init() -> kvm_sched_clock_init(stable):
     if (!stable) clear_sched_clock_stable()
     paravirt_set_sched_clock(kvm_sched_clock_read)
> 
> Bigger concern(maybe) is clock source marked as stable still, though 
> called mark_tsc_unstable in
> non native sched clock?
> 
> Disclaimer: (just curious, seeing this x86 code for first time, so may 
> not know all paths)
> 

Yes, when clock mark unstable through tsc_.*mark_unstable() with 
non-native_sched_clock, clear_sched_clock_stable won't be called, thus 
sched_clock_irqtime still keep enabled.

Maybe the dedicated workqueue for sched_clock_irqtime is still needed 
considering this case.