linux-kernel - Re: [PATCH v10 18/20] timers: Implement the hierarchical pull model

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87sf2gqups.fsf@somnus>
Date: Mon, 29 Jan 2024 11:50:39 +0100
From: Anna-Maria Behnsen <anna-maria@...utronix.de>
To: Frederic Weisbecker <frederic@...nel.org>
Cc: linux-kernel@...r.kernel.org, Peter Zijlstra <peterz@...radead.org>,
 John Stultz <jstultz@...gle.com>, Thomas Gleixner <tglx@...utronix.de>,
 Eric Dumazet <edumazet@...gle.com>, "Rafael J . Wysocki"
 <rafael.j.wysocki@...el.com>, Arjan van de Ven <arjan@...radead.org>,
 "Paul E . McKenney" <paulmck@...nel.org>, Rik van Riel <riel@...riel.com>,
 Steven Rostedt <rostedt@...dmis.org>, Sebastian Siewior
 <bigeasy@...utronix.de>, Giovanni Gherdovich <ggherdovich@...e.cz>, Lukasz
 Luba <lukasz.luba@....com>, "Gautham R . Shenoy" <gautham.shenoy@....com>,
 Srinivas Pandruvada <srinivas.pandruvada@...el.com>, K Prateek Nayak
 <kprateek.nayak@....com>, Boqun Feng <boqun.feng@...il.com>
Subject: Re: [PATCH v10 18/20] timers: Implement the hierarchical pull model

Frederic Weisbecker <frederic@...nel.org> writes:

> Le Mon, Jan 15, 2024 at 03:37:41PM +0100, Anna-Maria Behnsen a écrit :
>> +static bool tmigr_inactive_up(struct tmigr_group *group,
>> +			      struct tmigr_group *child,
>> +			      void *ptr)
>> +{
>> +	union tmigr_state curstate, newstate, childstate;
>> +	struct tmigr_walk *data = ptr;
>> +	bool walk_done;
>> +	u8 childmask;
>> +
>> +	childmask = data->childmask;
>> +	curstate.state = atomic_read(&group->migr_state);
>> +	childstate.state = 0;
>> +
>> +	do {
>
> So I got the confirmation from Boqun (+Cc) and Paul that a failing cmpxchg
> may not order the load of the old value against subsequent loads. And
> that may apply to atomic_try_cmpxchg() as well.
>
> Therefore you not only need to turn group->migr_state read into
> an atomic_read_acquire() but you also need to do this on each iteration
> of this loop. For example you can move the read_acquire right here.

I tried to read and understand more about the memory barriers especially
the acquire/release stuff. So please correct me whenever I'm wrong.

We have to make sure that the child/group state values contain the last
updates and prevent reordering to be able to rely on those values.

So I understand, that we need the atomic_read_acquire() here for the
child state, because we change the group state accordingly and need to
make sure, that it contains the last update of it. The cmpxchg which
writes the child state is (on success) a full memory barrier. And the
atomic_read_acquire() makes sure all preceding "critical sections"
(which ends with the full memory barrier) are visible. Is this right?

To make sure the proper states are used, atomic_read_acquire() is then
also required in:
  - tmigr_check_migrator()
  - tmigr_check_migrator_and_lonely()
  - tmigr_check_lonely()
  - tmigr_new_timer_up() (for childstate and groupstate)
  - tmigr_connect_child_parent()
Right?

Regarding the pairing of acquire: What happens when two
atomic_read_acquire() are executed afterwards without pairing 1:1 with a
release or stronger memory barrier?

Now I want to understand the case for the group state here and also in
active_up path. When reading it without acquire, it is possible, that
not all changes are visible due to reordering,... . But then the worst
outcome would be that the cmpxchg fails and the loop has to be done once
more? Is this right?

I know that memory barriers are not for free and redo the loop is also
not for free. But I don't know which of both is worse. At least in
inactive_up() path, we are not in the critical path. In active_up() it
would be good to take the less expensive option.

I want to understand the atomic_try_cmpxchg_acquire() variant: The Read
is an acquire, so even if the compare/write fails, the value which is
handed back is the one which was update last with a succesful cmpxchg
and then we can rely on this value?

Thanks a lot in advance for the help to understand this topic a little
better!

	Anna-Maria

>
> Thanks.
>
>> +		if (child)
>> +			childstate.state = atomic_read(&child->migr_state);
>> +
>> +		newstate = curstate;
>> +		walk_done = true;
>> +
>> +		/* Reset active bit when the child is no longer active */
>> +		if (!childstate.active)
>> +			newstate.active &= ~childmask;
>> +
>> +		if (newstate.migrator == childmask) {
>> +			/*
>> +			 * Find a new migrator for the group, because the child
>> +			 * group is idle!
>> +			 */
>> +			if (!childstate.active) {
>> +				unsigned long new_migr_bit, active = newstate.active;
>> +
>> +				new_migr_bit = find_first_bit(&active, BIT_CNT);
>> +
>> +				if (new_migr_bit != BIT_CNT) {
>> +					newstate.migrator = BIT(new_migr_bit);
>> +				} else {
>> +					newstate.migrator = TMIGR_NONE;
>> +
>> +					/* Changes need to be propagated */
>> +					walk_done = false;
>> +				}
>> +			}
>> +		}
>> +
>> +		newstate.seq++;
>> +
>> +		WARN_ON_ONCE((newstate.migrator != TMIGR_NONE) && !(newstate.active));
>> +
>> +	} while (!atomic_try_cmpxchg(&group->migr_state, &curstate.state, newstate.state));