linux-kernel - Re: [PATCH v10 18/20] timers: Implement the hierarchical pull model

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87bk90i2ja.fsf@somnus>
Date: Thu, 01 Feb 2024 17:15:37 +0100
From: Anna-Maria Behnsen <anna-maria@...utronix.de>
To: Frederic Weisbecker <frederic@...nel.org>
Cc: linux-kernel@...r.kernel.org, Peter Zijlstra <peterz@...radead.org>,
 John Stultz <jstultz@...gle.com>, Thomas Gleixner <tglx@...utronix.de>,
 Eric Dumazet <edumazet@...gle.com>, "Rafael J . Wysocki"
 <rafael.j.wysocki@...el.com>, Arjan van de Ven <arjan@...radead.org>,
 "Paul E . McKenney" <paulmck@...nel.org>, Rik van Riel <riel@...riel.com>,
 Steven Rostedt <rostedt@...dmis.org>, Sebastian Siewior
 <bigeasy@...utronix.de>, Giovanni Gherdovich <ggherdovich@...e.cz>, Lukasz
 Luba <lukasz.luba@....com>, "Gautham R . Shenoy" <gautham.shenoy@....com>,
 Srinivas Pandruvada <srinivas.pandruvada@...el.com>, K Prateek Nayak
 <kprateek.nayak@....com>
Subject: Re: [PATCH v10 18/20] timers: Implement the hierarchical pull model

Frederic Weisbecker <frederic@...nel.org> writes:

> Le Mon, Jan 15, 2024 at 03:37:41PM +0100, Anna-Maria Behnsen a écrit :
>> +static void tmigr_connect_child_parent(struct tmigr_group *child,
>> +				       struct tmigr_group *parent)
>> +{
>> +	union tmigr_state childstate;
>> +
>> +	raw_spin_lock_irq(&child->lock);
>> +	raw_spin_lock_nested(&parent->lock, SINGLE_DEPTH_NESTING);
>> +
>> +	child->parent = parent;
>> +	child->childmask = BIT(parent->num_children++);
>> +
>> +	raw_spin_unlock(&parent->lock);
>> +	raw_spin_unlock_irq(&child->lock);
>> +
>> +	/*
>> +	 * To prevent inconsistent states, active children need to be active in
>> +	 * the new parent as well. Inactive children are already marked inactive
>> +	 * in the parent group.
>> +	 */
>> +	childstate.state = atomic_read(&child->migr_state);
>> +	if (childstate.migrator != TMIGR_NONE) {
>
> Is it possible here to connect a running online child (not one that we just
> created) to a new parent?

connect_child_parent() is only executed for the just created ones. So,
yes in theory this would be possible, but it doesn't happen as
tmigr_setup_groups() takes care to make it right (hopefully :)). When a
LVL0 group has some space left, only the connection between tmc and the
LVL0 group is done in tmigr_setup_groups(). If there is no space left in
LVL0 group, then a new group is created and depending on the levels
which has to be created only executed for the new ones.

> If not, is it possible that a newly created child is
> not TMIGR_NONE?

Yes. See tmigr_cpu_online(). When new groups have to be created starting
from LVL0, then they are not active - so TMIGR_NONE is set. Activating
the new online CPU is done afterwards.

But if it is required to add also a new level at the top, then it is
mandatory to propagate the active state of the already existing child to
the new parent. The connect_child_parent() is then also executed for the
formerly top level group (child) to the newly created group (parent).

>
>> +		struct tmigr_walk data;
>> +
>> +		data.childmask = child->childmask;
>> +
>> +		/*
>> +		 * There is only one new level per time. When connecting the
>> +		 * child and the parent and set the child active when the parent
>> +		 * is inactive, the parent needs to be the uppermost
>> +		 * level. Otherwise there went something wrong!
>> +		 */
>> +		WARN_ON(!tmigr_active_up(parent, child, &data) && parent->parent);
>> +	}
>> +}
> [...]
>> +static int tmigr_cpu_online(unsigned int cpu)
>> +{
>> +	struct tmigr_cpu *tmc = this_cpu_ptr(&tmigr_cpu);
>> +	int ret;
>> +
>> +	/* First online attempt? Initialize CPU data */
>> +	if (!tmc->tmgroup) {
>> +		raw_spin_lock_init(&tmc->lock);
>> +
>> +		ret = tmigr_add_cpu(cpu);
>> +		if (ret < 0)
>> +			return ret;
>> +
>> +		if (tmc->childmask == 0)
>> +			return -EINVAL;
>> +
>> +		timerqueue_init(&tmc->cpuevt.nextevt);
>> +		tmc->cpuevt.nextevt.expires = KTIME_MAX;
>> +		tmc->cpuevt.ignore = true;
>> +		tmc->cpuevt.cpu = cpu;
>> +
>> +		tmc->remote = false;
>> +		WRITE_ONCE(tmc->wakeup, KTIME_MAX);
>> +	}
>> +	raw_spin_lock_irq(&tmc->lock);
>> +	tmc->idle = timer_base_is_idle();
>> +	if (!tmc->idle)
>> +		__tmigr_cpu_activate(tmc);
>
> Heh, I was about to say that it's impossible that timer_base_is_idle()
> at this stage but actually if we run in nohz_full...
>
> It happens so that nohz_full is deactivated until rcutree_online_cpu()
> which calls tick_dep_clear() but it's a pure coincidence that might
> disappear one day. So yes, let's keep it that way.

I instrumented the code (with NOHZ FULL and NOHZ_IDLE) to make sure the
timer migration hierarchy state 'idle' is in sync with the timer base
'idle'. And this was one part where it was possible that it runs out of
sync as I remember correctly. But if I understood you correctly, this
shouldn't happen at the moment?

Thanks,

	Anna-Maria