linux-kernel - Re: [PATCH v10 18/20] timers: Implement the hierarchical pull model

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87sf2c6xjc.fsf@somnus>
Date: Thu, 01 Feb 2024 15:59:03 +0100
From: Anna-Maria Behnsen <anna-maria@...utronix.de>
To: Frederic Weisbecker <frederic@...nel.org>
Cc: linux-kernel@...r.kernel.org, Peter Zijlstra <peterz@...radead.org>,
 John Stultz <jstultz@...gle.com>, Thomas Gleixner <tglx@...utronix.de>,
 Eric Dumazet <edumazet@...gle.com>, "Rafael J . Wysocki"
 <rafael.j.wysocki@...el.com>, Arjan van de Ven <arjan@...radead.org>,
 "Paul E . McKenney" <paulmck@...nel.org>, Rik van Riel <riel@...riel.com>,
 Steven Rostedt <rostedt@...dmis.org>, Sebastian Siewior
 <bigeasy@...utronix.de>, Giovanni Gherdovich <ggherdovich@...e.cz>, Lukasz
 Luba <lukasz.luba@....com>, "Gautham R . Shenoy" <gautham.shenoy@....com>,
 Srinivas Pandruvada <srinivas.pandruvada@...el.com>, K Prateek Nayak
 <kprateek.nayak@....com>
Subject: Re: [PATCH v10 18/20] timers: Implement the hierarchical pull model

Frederic Weisbecker <frederic@...nel.org> writes:

> Le Mon, Jan 15, 2024 at 03:37:41PM +0100, Anna-Maria Behnsen a écrit :
>> +/*
>> + * Returns true, if there is nothing to be propagated to the next level
>> + *
>> + * @data->firstexp is set to expiry of first gobal event of the (top level of
>> + * the) hierarchy, but only when hierarchy is completely idle.
>> + *
>> + * This is the only place where the group event expiry value is set.
>> + */
>> +static
>> +bool tmigr_update_events(struct tmigr_group *group, struct tmigr_group *child,
>> +			 struct tmigr_walk *data, union tmigr_state childstate,
>> +			 union tmigr_state groupstate)
>> +{
>> +	struct tmigr_event *evt, *first_childevt;
>> +	bool walk_done, remote = data->remote;
>> +	bool leftmost_change = false;
>> +	u64 nextexp;
>> +
>> +	if (child) {
>> +		raw_spin_lock(&child->lock);
>> +		raw_spin_lock_nested(&group->lock, SINGLE_DEPTH_NESTING);
>> +
>> +		if (childstate.active) {
>
> Since you're going to do the atomic_read(&group->migr_state)
> within the group->lock, you may as well do the atomic_read(&child->migr_state)
> within the child->lock. It won't hurt and simplifies the picture
> in the mind.

Already changed it this way.

> Then you can add the following comment to outline the ordering
> expectations:
>
> /*
>  * Observing child->migr_state.active means that:
>  *
>  * 1) Either the child is effectively active, then it's fine to stop here
>  *
>  * 2) Or we are racing with a CPU going inactive and this childstate is actually
>  *    not active anymore but tmigr_inactive_up() hasn't yet called tmigr_update_event()
>  *    on it. It's fine to stop here because that pending call will take care
>  *    of the rest of the propagation.
>  *
>  * 3) In any case it's impossible to observe childstate.active when a racing
>  *    CPU made it inactive and also called tmigr_update_event() on it. The
>  *    group->lock enforces ordering such that ->migr_state changes
>  *    in tmigr_inactive_up() are released by group->lock UNLOCK on the
>  *    subsequent call to tmigr_update_event() and then acquired by
>  *    child->lock LOCK in tmigr_new_timer() -> tmigr_update_event().
>  */

I'll add the comment! Thanks

>> +			walk_done = true;
>> +			goto unlock;
>> +		}
>> +
>> +		first_childevt = tmigr_next_groupevt(child);
>> +		nextexp = child->next_expiry;
>> +		evt = &child->groupevt;
>> +	} else {
>> +		nextexp = data->nextexp;
>> +
>> +		first_childevt = evt = data->evt;
>> +
>> +		/*
>> +		 * Walking the hierarchy is required in any case when a
>> +		 * remote expiry was done before. This ensures to not lose
>> +		 * already queued events in non active groups (see section
>> +		 * "Required event and timerqueue update after a remote
>> +		 * expiry" in the documentation at the top).
>> +		 *
>> +		 * The two call sites which are executed without a remote expiry
>> +		 * before, are not prevented from propagating changes through
>> +		 * the hierarchy by the return:
>> +		 *  - When entering this path by tmigr_new_timer(), @evt->ignore
>> +		 *    is never set.
>> +		 *  - tmigr_inactive_up() takes care of the propagation by
>> +		 *    itself and ignores the return value. But an immediate
>> +		 *    return is required because nothing has to be done in this
>> +		 *    level as the event could be ignored.
>> +		 */
>> +		if (evt->ignore && !remote)
>> +			return true;
>> +
>> +		raw_spin_lock(&group->lock);
>> +	}
>> +
>> +	if (nextexp == KTIME_MAX) {
>> +		evt->ignore = true;
>> +
>> +		/*
>> +		 * When the next child event could be ignored (nextexp is
>> +		 * KTIME_MAX) and there was no remote timer handling before or
>> +		 * the group is already active, there is no need to walk the
>> +		 * hierarchy even if there is a parent group.
>> +		 *
>> +		 * The other way round: even if the event could be ignored, but
>> +		 * if a remote timer handling was executed before and the group
>> +		 * is not active, walking the hierarchy is required to not miss
>> +		 * an enqueued timer in the non active group. The enqueued timer
>> +		 * of the group needs to be propagated to a higher level to
>> +		 * ensure it is handled.
>> +		 */
>> +		if (!remote || groupstate.active) {
>
> Same here, fetching group->migr_state.active from within the lock simplifies
> the mind mapping.

Sure. Already changed it.

Thanks,
	Anna-Maria