[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87sf2c6xjc.fsf@somnus>
Date: Thu, 01 Feb 2024 15:59:03 +0100
From: Anna-Maria Behnsen <anna-maria@...utronix.de>
To: Frederic Weisbecker <frederic@...nel.org>
Cc: linux-kernel@...r.kernel.org, Peter Zijlstra <peterz@...radead.org>,
John Stultz <jstultz@...gle.com>, Thomas Gleixner <tglx@...utronix.de>,
Eric Dumazet <edumazet@...gle.com>, "Rafael J . Wysocki"
<rafael.j.wysocki@...el.com>, Arjan van de Ven <arjan@...radead.org>,
"Paul E . McKenney" <paulmck@...nel.org>, Rik van Riel <riel@...riel.com>,
Steven Rostedt <rostedt@...dmis.org>, Sebastian Siewior
<bigeasy@...utronix.de>, Giovanni Gherdovich <ggherdovich@...e.cz>, Lukasz
Luba <lukasz.luba@....com>, "Gautham R . Shenoy" <gautham.shenoy@....com>,
Srinivas Pandruvada <srinivas.pandruvada@...el.com>, K Prateek Nayak
<kprateek.nayak@....com>
Subject: Re: [PATCH v10 18/20] timers: Implement the hierarchical pull model
Frederic Weisbecker <frederic@...nel.org> writes:
> Le Mon, Jan 15, 2024 at 03:37:41PM +0100, Anna-Maria Behnsen a écrit :
>> +/*
>> + * Returns true, if there is nothing to be propagated to the next level
>> + *
>> + * @data->firstexp is set to expiry of first gobal event of the (top level of
>> + * the) hierarchy, but only when hierarchy is completely idle.
>> + *
>> + * This is the only place where the group event expiry value is set.
>> + */
>> +static
>> +bool tmigr_update_events(struct tmigr_group *group, struct tmigr_group *child,
>> + struct tmigr_walk *data, union tmigr_state childstate,
>> + union tmigr_state groupstate)
>> +{
>> + struct tmigr_event *evt, *first_childevt;
>> + bool walk_done, remote = data->remote;
>> + bool leftmost_change = false;
>> + u64 nextexp;
>> +
>> + if (child) {
>> + raw_spin_lock(&child->lock);
>> + raw_spin_lock_nested(&group->lock, SINGLE_DEPTH_NESTING);
>> +
>> + if (childstate.active) {
>
> Since you're going to do the atomic_read(&group->migr_state)
> within the group->lock, you may as well do the atomic_read(&child->migr_state)
> within the child->lock. It won't hurt and simplifies the picture
> in the mind.
Already changed it this way.
> Then you can add the following comment to outline the ordering
> expectations:
>
> /*
> * Observing child->migr_state.active means that:
> *
> * 1) Either the child is effectively active, then it's fine to stop here
> *
> * 2) Or we are racing with a CPU going inactive and this childstate is actually
> * not active anymore but tmigr_inactive_up() hasn't yet called tmigr_update_event()
> * on it. It's fine to stop here because that pending call will take care
> * of the rest of the propagation.
> *
> * 3) In any case it's impossible to observe childstate.active when a racing
> * CPU made it inactive and also called tmigr_update_event() on it. The
> * group->lock enforces ordering such that ->migr_state changes
> * in tmigr_inactive_up() are released by group->lock UNLOCK on the
> * subsequent call to tmigr_update_event() and then acquired by
> * child->lock LOCK in tmigr_new_timer() -> tmigr_update_event().
> */
I'll add the comment! Thanks
>> + walk_done = true;
>> + goto unlock;
>> + }
>> +
>> + first_childevt = tmigr_next_groupevt(child);
>> + nextexp = child->next_expiry;
>> + evt = &child->groupevt;
>> + } else {
>> + nextexp = data->nextexp;
>> +
>> + first_childevt = evt = data->evt;
>> +
>> + /*
>> + * Walking the hierarchy is required in any case when a
>> + * remote expiry was done before. This ensures to not lose
>> + * already queued events in non active groups (see section
>> + * "Required event and timerqueue update after a remote
>> + * expiry" in the documentation at the top).
>> + *
>> + * The two call sites which are executed without a remote expiry
>> + * before, are not prevented from propagating changes through
>> + * the hierarchy by the return:
>> + * - When entering this path by tmigr_new_timer(), @evt->ignore
>> + * is never set.
>> + * - tmigr_inactive_up() takes care of the propagation by
>> + * itself and ignores the return value. But an immediate
>> + * return is required because nothing has to be done in this
>> + * level as the event could be ignored.
>> + */
>> + if (evt->ignore && !remote)
>> + return true;
>> +
>> + raw_spin_lock(&group->lock);
>> + }
>> +
>> + if (nextexp == KTIME_MAX) {
>> + evt->ignore = true;
>> +
>> + /*
>> + * When the next child event could be ignored (nextexp is
>> + * KTIME_MAX) and there was no remote timer handling before or
>> + * the group is already active, there is no need to walk the
>> + * hierarchy even if there is a parent group.
>> + *
>> + * The other way round: even if the event could be ignored, but
>> + * if a remote timer handling was executed before and the group
>> + * is not active, walking the hierarchy is required to not miss
>> + * an enqueued timer in the non active group. The enqueued timer
>> + * of the group needs to be propagated to a higher level to
>> + * ensure it is handled.
>> + */
>> + if (!remote || groupstate.active) {
>
> Same here, fetching group->migr_state.active from within the lock simplifies
> the mind mapping.
Sure. Already changed it.
Thanks,
Anna-Maria
Powered by blists - more mailing lists