[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170510200946.GB5628@roeck-us.net>
Date: Wed, 10 May 2017 13:09:46 -0700
From: Guenter Roeck <linux@...ck-us.net>
To: Tommi Rantala <tt.rantala@...il.com>
Cc: Thomas Gleixner <tglx@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>,
Fenghua Yu <fenghua.yu@...el.com>,
Jean Delvare <jdelvare@...e.com>, linux-hwmon@...r.kernel.org,
Sebastian Siewior <bigeasy@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>, x86@...nel.org
Subject: Re: [PATCH] hwmon: (coretemp) Handle frozen hotplug state correctly
On Wed, May 10, 2017 at 10:16:33PM +0300, Tommi Rantala wrote:
> 2017-05-10 17:30 GMT+03:00 Thomas Gleixner <tglx@...utronix.de>:
> > The recent conversion to the hotplug state machine missed that the original
> > hotplug notifiers did not execute in the frozen state, which is used on
> > suspend on resume.
> >
> > This does not matter on single socket machines, but on multi socket systems
> > this breaks when the device for a non-boot socket is removed when the last
> > CPU of that socket is brought offline. The device removal locks up the
> > machine hard w/o any debug output.
> >
> > Prevent executing the hotplug callbacks when cpuhp_tasks_frozen is true.
> >
> > Thanks to Tommi for providing debug information patiently while I failed to
> > spot the obvious.
> >
> > Fixes: e00ca5df37ad ("hwmon: (coretemp) Convert to hotplug state machine")
> > Reported-by: Tommi Rantala <tt.rantala@...il.com>
> > Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
>
> Many thanks, I can confirm that it works well!
>
Ok if I add your Tested-by: ?
Thanks,
Guenter
> -Tommi
>
> > ---
> > drivers/hwmon/coretemp.c | 14 ++++++++++++++
> > 1 file changed, 14 insertions(+)
> >
> > --- a/drivers/hwmon/coretemp.c
> > +++ b/drivers/hwmon/coretemp.c
> > @@ -605,6 +605,13 @@ static int coretemp_cpu_online(unsigned
> > struct platform_data *pdata;
> >
> > /*
> > + * Don't execute this on resume as the offline callback did
> > + * not get executed on suspend.
> > + */
> > + if (cpuhp_tasks_frozen)
> > + return 0;
> > +
> > + /*
> > * CPUID.06H.EAX[0] indicates whether the CPU has thermal
> > * sensors. We check this bit only, all the early CPUs
> > * without thermal sensors will be filtered out.
> > @@ -654,6 +661,13 @@ static int coretemp_cpu_offline(unsigned
> > struct temp_data *tdata;
> > int indx, target;
> >
> > + /*
> > + * Don't execute this on suspend as the device remove locks
> > + * up the machine.
> > + */
> > + if (cpuhp_tasks_frozen)
> > + return 0;
> > +
> > /* If the physical CPU device does not exist, just return */
> > if (!pdev)
> > return 0;
Powered by blists - more mailing lists