linux-kernel - Re: [PATCH v2] tick/broadcast: Do not set oneshot_mask except was

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZFDxph8YDPjwvbej@lothringen>
Date:   Tue, 2 May 2023 13:19:02 +0200
From:   Frederic Weisbecker <frederic@...nel.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Victor Hassan <victor@...winnertech.com>, fweisbec@...il.com,
        mingo@...nel.org, jindong.yue@....com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] tick/broadcast: Do not set oneshot_mask except
 was_periodic was true

On Fri, Apr 21, 2023 at 11:32:15PM +0200, Thomas Gleixner wrote:
> On Wed, Apr 19 2023 at 15:36, Frederic Weisbecker wrote:
> This path is taken during the switch from periodic to oneshot mode. The
> way how this works is:
> 
> boot()
>   setup_periodic()
>     setup_periodic_broadcast()
> 
>   // From here on everything depends on the periodic broadcasting
> 
>   highres_clocksource_becomes_available()
>     tick_clock_notify() <- Set's the .check_clocks bit on all CPUs
> 
> Now the first CPU which observes that bit switches to oneshot mode, but
> the other CPUs might be waiting for the periodic broadcast at that
> point. So the periodic to oneshot transition does:
> 
>   		cpumask_copy(tmpmask, tick_broadcast_mask);
> 		/* Remove the local CPU as it is obviously not idle */
>   		cpumask_clear_cpu(cpu, tmpmask);
> 		cpumask_or(tick_broadcast_oneshot_mask, tick_broadcast_oneshot_mask, tmpmask);
> 
> I.e. it makes sure that _ALL_ not yet converted CPUs will get woken up
> by the new oneshot broadcast handler. 
> 
> Now when the other CPUs will observe the check_clock bit after that they
> need to clear their bit in the oneshot mask while switching themself
> from periodic to oneshot one otherwise the next tick_broadcast_enter()
> would do nothing. That's all serialized by broadcast lock, so no race.
> 
> But that has nothing to do with switching the underlying clockevent
> device. At that point all CPUs are already in oneshot mode and
> tick_broadcast_oneshot_mask is correct.
> 
> So that will take the other code path:
> 
>     if (bc->event_handler == tick_handle_oneshot_broadcast) {
>        // not taken because the new device is not yet set up
>        return;
>     }
> 
>     if (from_periodic) {
>        // not taken because the switchover already happened
>        // Here is where the cpumask magic happens
>     }
>

I see, I guess I got lost somewhere into the tree of the possible
callchains :)

tick_broadcast_setup_oneshot()
	tick_broadcast_switch_to_oneshot
		tick_install_broadcast_device
			tick_check_new_device
				clockevents_notify_released
					clockevents_register_device (new device)
				clockevents_register_device (new device)
		tick_switch_to_oneshot
			tick_init_highres
				 hrtimer_switch_to_hres
					hrtimer_run_queues (timer softirq)
			tick_nohz_switch_to_nohz
				tick_check_oneshot_change (test and clear check_clock)
					hrtimer_run_queues (timer softirq))
	tick_device_uses_broadcast
		tick_setup_device
			tick_install_replacement
				clockevents_replace
					__clockevents_unbind
						clockevents_unbind
							unbind_device_store (sysfs)
							clockevents_unbind_device (driver)
			tick_check_new_device
				clockevents_notify_released
					clockevents_register_device (new device)
				clockevents_register_device (new device)
	tick_broadcast_control
		tick_broadcast_enable (cpuidle driver register, cpu up, ...)
		tick_broadcast_disable (cpuidle driver unregister, ...)
		tick_broadcast_force (amd apic bug setup)


Ok I get the check_clock game. But then, why do we need to reprogram
again the broadcast device to fire in one jiffy if the caller is
tick_nohz_switch_to_nohz() (that is the (bc->event_handler ==
tick_handle_oneshot_broadcast) branch)? In that case the broadcast device
should have been programmed already by the CPU that first switched the
current broadcast device, right?

> > For the case where the other CPUs have already installed their
> > tick devices and if that function is called with from_periodic=true,
> > the other CPUs will notice the oneshot change on their next call to
> > tick_broadcast_enter() thanks to the lock, right? So the tick broadcast
> > will keep firing until all CPUs have been through idle once and called
> > tick_broadcast_exit(), right? Because only them can clear themselves
> > from tick_broadcast_oneshot_mask, am I understanding this correctly?
> 
> No. See above. It's about the check_clock bit handling on the other
> CPUs.
> 
> It seems I failed miserably to explain that coherently with the tons of
> comments added. Hrmpf :(

Don't pay too much attention, confusion is my vehicle to explore any code
that I'm not used to. But yes I must confess the
(bc->event_handler == tick_handle_oneshot_broadcast) may deserve a comment
remaining where we come from (ie: low-res hrtimer softirq).

Thanks.