[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20221118232350.GA2340322@paulmck-ThinkPad-P17-Gen-1>
Date: Fri, 18 Nov 2022 15:23:50 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Sven Schnelle <svens@...ux.ibm.com>
Cc: Davidlohr Bueso <dave@...olabs.net>,
Josh Triplett <josh@...htriplett.org>,
linux-kernel@...r.kernel.org, rcu@...r.kernel.org
Subject: Re: [PATCH 1/2] torture: use for_each_present() loop in
torture_online_all()
On Thu, Nov 17, 2022 at 07:06:37AM -0800, Paul E. McKenney wrote:
> On Thu, Nov 17, 2022 at 07:30:32AM +0100, Sven Schnelle wrote:
> > Hi Paul,
> >
> > "Paul E. McKenney" <paulmck@...nel.org> writes:
> >
> > >> > Yes, rcutorture has lower-level checks for CPUs being hotplugged
> > >> > behind its back. Which might be sufficient. But this patch is in
> > >> > response to something bad happening if the CPU is also not present in
> > >> > the cpu_present_mask. Would that same bad thing happen if rcutorture saw
> > >> > the CPU in cpu_online_mask, but by the time it attempted to CPU-hotplug
> > >> > it, that CPU was gone not just from cpu_online_mask, but also from
> > >> > cpu_present_mask?
> > >> >
> > >> > Or are CPUs never removed from cpu_present_mask?
> > >>
> > >> In the current implementation CPUs can only be added to the
> > >> cpu_present_mask, but never removed. This might change in the future
> > >> when we get support from firmware for that, but the current s390 code
> > >> doesn't do that.
> > >
> > > Very good!
> > >
> > > Then could the patch please check that bits are never removed?
> > > That way the code will complain should firmware support be added.
> > >
> > > Thanx, Paul
> >
> > I'm not sure whether i fully understand that. If the CPU could
> > be removed from the system and the cpu_present_mask, that could
> > happen at any time. So i don't see how we should check about that?
>
> Well, that is my question to you. ;-)
>
> Suppose we have the following sequence of events:
>
> o rcutorture sees that CPU 5 is in cpu_present_mask, but offline.
>
> o rcutorture therefore decides to online CPU 5.
>
> o s390 firmware removes CPU 5, and s390 architecture code then
> clears it from the cpu_present_mask.
>
> o rcutorture proceeds with onlining CPU 5.
>
> Don't we then get the same problem that prompted you to change from
> cpu_possible_mask to cpu_present mask? If not, why can't the rcutorture
> code continue to use cpu_possible_mask?
>
> If it really is bad to try to online or offline a CPU that is in
> cpu_possible_mask but not in cpu_present_mask, and if CPUs can be removed
> from cpu_present_mask, then we need some way to synchronize the removal
> of CPUs from cpu_present_mask. There are of course a lot of possible
> ways to do that synchronization, for example, protecting cpu_present_mask
> with a mutex or similar.
>
> Alternatively, s390 could restrict things. One way to do that would
> be to turn off rcutorture's use of CPU hotplug when running on s390,
> for example, by using the module parameters provided for that purpose.
> Another way to do that would be to refrain from removing CPUs from
> cpu_present_mask while rcutorture is running.
>
> Are there other approaches?
For the near term, why not have rcutorture keep a snapshot of
cpu_present_mask, and splat if a CPU is ever removed from that mask?
That would catch any issues, and defer any synchronization decisions to
a time at which we actually have some chance of knowing what is going on.
Thanx, Paul
Powered by blists - more mailing lists