[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZnlS1QcFgHvJGm7J@lothringen>
Date: Mon, 24 Jun 2024 13:04:53 +0200
From: Frederic Weisbecker <frederic@...nel.org>
To: Anna-Maria Behnsen <anna-maria@...utronix.de>
Cc: Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org,
Borislav Petkov <bp@...en8.de>, Narasimhan V <Narasimhan.V@....com>
Subject: Re: [PATCH 0/3] timer_migration: Fix a possible race and improvements
On Mon, Jun 24, 2024 at 10:58:26AM +0200, Anna-Maria Behnsen wrote:
> Frederic Weisbecker <frederic@...nel.org> writes:
> > 0) Hierarchy has only 8 CPUs (single node for now with only CPU 0
> > as active.
> >
> >
> > [GRP1:0]
> > migrator = TMIGR_NONE
> > active = NONE
> > nextevt = KTIME_MAX
> > \
> > [GRP0:0] [GRP0:1]
> > migrator = 0 migrator = TMIGR_NONE
> > active = 0 active = NONE
> > nextevt = KTIME_MAX nextevt = KTIME_MAX
> > / \ |
> > 0 1 .. 7 8
> > active idle !online
> >
> > 1) CPU 8 is booting and creates a new node and a new top. For now it's
> > only connected to GRP0:1, not yet to GRP0:0. Also CPU 8 hasn't called
> > __tmigr_cpu_activate() on itself yet.
>
> NIT: In step 1) CPU8 is not yet connected to GRP0:1 as the second while
> loop is not yet finished, but nevertheless...
Yes, I was in the second loop in my mind, but that didn't transcribe well :-)
>
> >
> > [GRP1:0]
> > migrator = TMIGR_NONE
> > active = NONE
> > nextevt = KTIME_MAX
> > / \
> > [GRP0:0] [GRP0:1]
> > migrator = 0 migrator = TMIGR_NONE
> > active = 0 active = NONE
> > nextevt = KTIME_MAX nextevt = KTIME_MAX
> > / \ |
> > 0 1 .. 7 8
> > active idle active
> >
> > 2) CPU 8 connects GRP0:0 to GRP1:0 and observes while in
> > tmigr_connect_child_parent() that GRP0:0 is not TMIGR_NONE. So it
> > prepares to call tmigr_active_up() on it. It hasn't done it yet.
>
> NIT: CPU8 keeps its state !online until step 5.
Oops, copy paste hazards.
> Holding the lock will not help as the state is not protected by the
> lock.
No but any access happening before an UNLOCK is guaranteed to be visible
after the next LOCK. This makes sure that either:
1) If the remote CPU going inactive has passed tmigr_update_events() with
its unlock of group->lock then the onlining CPU will see the TMIGR_NONE
or:
1) If the onlining CPU locks before the remote CPU calls tmigr_update_events(),
then the subsequent LOCK on group->lock in tmigr_update_events() will acquire
the new parent link and propagate the up the inactive state.
And yes there is an optimization case where we don't lock:
if (evt->ignore && !remote && group->parent)
return true;
And that would fall through the cracks. So, forget about that lock idea.
> +static void tmigr_setup_active_up(struct tmigr_group *group, struct tmigr_group *child)
> +{
> + union tmigr_state curstate, childstate;
> + bool walk_done;
> +
> + /*
> + * FIXME: Memory barrier is required here as the child state
> + * could have changed in the meantime
> + */
> + curstate.state = atomic_read_acquire(&group->migr_state);
> +
> + for (;;) {
> + childstate.state = atomic_read(&child->migr_state);
> + if (!childstate.active)
> + return;
Ok there could have been a risk that we miss the remote CPU going active. But
again thanks to the lock this makes sure that either we observe the childstate
as active or the remote CPU sees the link and propagates its active state. And
the unlocked optimization tmigr_update_events() still works because either it
sees the new parent and proceeds or it sees an intermediate parent and the next
ones will be locked.
Phew!
I'll do a deeper review this evening but it _looks_ ok.
Thanks.
Powered by blists - more mailing lists