[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFBcO+99Ax5MuOtzNx=NrmnUN=+913Sc-DV83ObOi01A=kkN3w@mail.gmail.com>
Date: Tue, 16 Mar 2021 03:15:13 +0000
From: Alexey Klimov <aklimov@...hat.com>
To: Daniel Jordan <daniel.m.jordan@...cle.com>
Cc: linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
Peter Zijlstra <peterz@...radead.org>, yury.norov@...il.com,
tglx@...utronix.de, Joshua Baker <jobaker@...hat.com>,
audralmitchel@...il.com, arnd@...db.de, gregkh@...uxfoundation.org,
rafael@...nel.org, tj@...nel.org,
Qais Yousef <qais.yousef@....com>, hannes@...xchg.org,
Alexey Klimov <klimov.linux@...il.com>
Subject: Re: [PATCH v2] cpu/hotplug: wait for cpuset_hotplug_work to finish on
cpu onlining
On Fri, Feb 12, 2021 at 7:42 PM Daniel Jordan
<daniel.m.jordan@...cle.com> wrote:
>
> Alexey Klimov <aklimov@...hat.com> writes:
> > int cpu_device_up(struct device *dev)
>
> Yeah, definitely better to do the wait here.
>
> > int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval)
> > {
> > - int cpu, ret = 0;
> > + struct device *dev;
> > + cpumask_var_t mask;
> > + int cpu, ret;
> > +
> > + if (!zalloc_cpumask_var(&mask, GFP_KERNEL))
> > + return -ENOMEM;
> >
> > + ret = 0;
> > cpu_maps_update_begin();
> > for_each_online_cpu(cpu) {
> > if (topology_is_primary_thread(cpu))
> > @@ -2099,18 +2098,35 @@ int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval)
> > * called under the sysfs hotplug lock, so it is properly
> > * serialized against the regular offline usage.
> > */
> > - cpuhp_offline_cpu_device(cpu);
> > + dev = get_cpu_device(cpu);
> > + dev->offline = true;
> > +
> > + cpumask_set_cpu(cpu, mask);
> > }
> > if (!ret)
> > cpu_smt_control = ctrlval;
> > cpu_maps_update_done();
> > +
> > + /* Tell user space about the state changes */
> > + for_each_cpu(cpu, mask) {
> > + dev = get_cpu_device(cpu);
> > + kobject_uevent(&dev->kobj, KOBJ_OFFLINE);
> > + }
> > +
> > + free_cpumask_var(mask);
> > return ret;
> > }
>
> Hrm, should the dev manipulation be kept in one place, something like
> this?
The first section of comment seems problematic to me with regards to such move:
* As this needs to hold the cpu maps lock it's impossible
* to call device_offline() because that ends up calling
* cpu_down() which takes cpu maps lock. cpu maps lock
* needs to be held as this might race against in kernel
* abusers of the hotplug machinery (thermal management).
Cpu maps lock is released in cpu_maps_update_done() hence we will move
dev->offline out of cpu maps lock. Maybe I misunderstood the comment
and it relates to calling cpu_down_maps_locked() under lock to avoid
race?
I failed to find the abusers of hotplug machinery in drivers/thermal/*
to track down the logic of potential race but I might have overlooked.
Anyway, if we move the update of dev->offline out, then it makes sense
to restore cpuhp_{offline,online}_cpu_device back and just use it.
I guess I'll update and re-send the patch and see how it goes.
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 8817ccdc8e112..aa21219a7b7c4 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -2085,11 +2085,20 @@ int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval)
> ret = cpu_down_maps_locked(cpu, CPUHP_OFFLINE);
> if (ret)
> break;
> +
> + cpumask_set_cpu(cpu, mask);
> + }
> + if (!ret)
> + cpu_smt_control = ctrlval;
> + cpu_maps_update_done();
> +
> + /* Tell user space about the state changes */
> + for_each_cpu(cpu, mask) {
> /*
> - * As this needs to hold the cpu maps lock it's impossible
> + * When the cpu maps lock was taken above it was impossible
> * to call device_offline() because that ends up calling
> * cpu_down() which takes cpu maps lock. cpu maps lock
> - * needs to be held as this might race against in kernel
> + * needed to be held as this might race against in kernel
> * abusers of the hotplug machinery (thermal management).
> *
> * So nothing would update device:offline state. That would
Yeah, reading how you re-phrased it, this seems to be about
cpu_down_maps_locked()/device_offline() locks and race rather than
updating stale dev->offline.
Thank you,
Alexey
Powered by blists - more mailing lists