linux-kernel - Re: [PATCH v6 19/21] timer: Implement the hierarchical pull model

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZFty1fNMlnuLk4qF@localhost.localdomain>
Date:   Wed, 10 May 2023 12:32:53 +0200
From:   Frederic Weisbecker <frederic@...nel.org>
To:     Anna-Maria Behnsen <anna-maria@...utronix.de>
Cc:     linux-kernel@...r.kernel.org,
        Peter Zijlstra <peterz@...radead.org>,
        John Stultz <jstultz@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Eric Dumazet <edumazet@...gle.com>,
        "Rafael J . Wysocki" <rafael.j.wysocki@...el.com>,
        Arjan van de Ven <arjan@...radead.org>,
        "Paul E . McKenney" <paulmck@...nel.org>,
        Frederic Weisbecker <fweisbec@...il.com>,
        Rik van Riel <riel@...riel.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Sebastian Siewior <bigeasy@...utronix.de>,
        Giovanni Gherdovich <ggherdovich@...e.cz>,
        Lukasz Luba <lukasz.luba@....com>,
        "Gautham R . Shenoy" <gautham.shenoy@....com>
Subject: Re: [PATCH v6 19/21] timer: Implement the hierarchical pull model

Le Wed, May 10, 2023 at 09:28:15AM +0200, Anna-Maria Behnsen a écrit :
> +static u64 tmigr_handle_remote_cpu(unsigned int cpu, u64 now,
> +				   unsigned long jif)
> +{
> +	struct timer_events tevt;
> +	struct tmigr_walk data;
> +	struct tmigr_cpu *tmc;
> +	u64 next = KTIME_MAX;
> +
> +	tmc = per_cpu_ptr(&tmigr_cpu, cpu);
> +
> +	raw_spin_lock_irq(&tmc->lock);
> +	/*
> +	 * Remote CPU is offline or no longer idle or other cpu handles cpu
> +	 * timers already or next event was already expired - return!
> +	 */
> +	if (!tmc->online || tmc->remote || tmc->cpuevt.ignore ||
> +	    now < tmc->cpuevt.nextevt.expires) {
> +		raw_spin_unlock_irq(&tmc->lock);
> +		return next;
> +	}
> +
> +	tmc->remote = 1;
> +
> +	/* Drop the lock to allow the remote CPU to exit idle */
> +	raw_spin_unlock_irq(&tmc->lock);
> +
> +	if (cpu != smp_processor_id())
> +		timer_expire_remote(cpu);
> +
> +	/*
> +	 * Pretend that there is no timer pending if the cpu is offline.
> +	 * Possible pending timers will be migrated later to an active cpu.
> +	 */
> +	if (cpu_is_offline(smp_processor_id())) {
> +		raw_spin_lock_irq(&tmc->lock);
> +		tevt.local = tevt.global = KTIME_MAX;
> +	} else {
> +		/*
> +		 * Lock ordering needs to be preserved - timer_base locks
> +		 * before tmigr related locks. During fetching the next
> +		 * timer interrupt, also tmc->lock needs to be
> +		 * held. Otherwise there is a possible race window against
> +		 * the CPU itself when it comes out of idle, updates the
> +		 * first timer and goes back to idle.
> +		 */
> +		timer_lock_remote_bases(cpu);

So the return value is ignored here.

In the case of !PREEMPT_RT, I suppose it's impossible for the target
CPU to be offline. You checked above tmc->online and in-between the
call to timer_lock_remote_bases(), the path is BH-disabled, this prevents
stop_machine from running and from setting the CPU as offline.

However in PREEMPT_RT, ksoftirqd (or timersd) is preemptible, so it seems
that it could happen in theory. And that could create a locking imbalance.

My suggestion would be to unconditionally lock the bases, you already checked if
!tmc->online before. The remote CPU may have gone down since then because the
tmc lock has been relaxed but it should be rare enough that you don't care
about optimizing with a lockless check. So you can just lock the bases,
lock the tmc and check again if tmc->online. If not then you can just ignore
the tmigr_new_timer_up call and propagation.

Thanks.

> +		raw_spin_lock(&tmc->lock);
> +
> +		/* next	event of cpu */
> +		fetch_next_timer_interrupt_remote(jif, now, &tevt, cpu);
> +	}
> +
> +	/*
> +	 * Nothing more to do when CPU came out of idle in the meantime - needs
> +	 * to be checked when holding the tmc lock to prevent race.
> +	 */
> +	if (!tmc->idle)
> +		goto unlock;
> +
> +	data.evt = &tmc->cpuevt;
> +	data.nextexp = tevt.global;
> +	data.groupstate.state = atomic_read(&tmc->tmgroup->migr_state);
> +	data.remote = true;
> +	tmc->cpuevt.ignore = 0;
> +
> +	walk_groups(&tmigr_new_timer_up, &data, tmc);
> +
> +	next = data.nextexp;
> +
> +unlock:
> +	tmc->remote = 0;
> +	raw_spin_unlock_irq(&tmc->lock);
> +
> +	return next;
> +}