[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1479770177.6544.195.camel@intel.com>
Date: Mon, 21 Nov 2016 23:16:19 +0000
From: "Pandruvada, Srinivas" <srinivas.pandruvada@...el.com>
To: "tglx@...utronix.de" <tglx@...utronix.de>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"peterz@...radead.org" <peterz@...radead.org>,
"Zhang, Rui" <rui.zhang@...el.com>,
"rt@...utronix.de" <rt@...utronix.de>,
"edubezval@...il.com" <edubezval@...il.com>,
"linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>,
"x86@...nel.org" <x86@...nel.org>, "bp@...en8.de" <bp@...en8.de>
Subject: Re: [patch 00/12] thermal/x86_pkg_temp: Sanitize yet another
hotplug and locking trainwreck
On Mon, 2016-11-21 at 22:34 +0100, Thomas Gleixner wrote:
> On Mon, 21 Nov 2016, Pandruvada, Srinivas wrote:
[...]
> Stupid me. I tested putting a socket offline, which works, but did
> not
> check what happens on module removal. Delta fix below. That needs to
> be
> folded into the series as the wreckage already happens before the
> last
> patch.
Your change below fixes the crash issue. Now I tested a case where the
last cpu offlined from a package, it removed thermal zone and added
zone back once any cpu from the package onlined. So this is working.
I want to try to run some workload on those cpu to bump up the
temperature and check interrupts. I am hitting some issue unrelated to
this change may be. I onlined three cpus from the package 1.
[189443.567728] smpboot: Booting Node 1 Processor 15 APIC 0x2e
[189656.625947] smpboot: Booting Node 1 Processor 8 APIC 0x20
[189829.545851] smpboot: Booting Node 1 Processor 24 APIC 0x21
But I can't schedule anything on those CPUs. For example now can't run
turbostat, it complains
"
turbostat: re-initialized with num_cpus 19
Could not migrate to CPU 8
"
Same with
#taskset 0x100 stress -c 1
taskset: failed to set pid 0's affinity: Invalid argument
I am on the latest linux-pm/linux-next tree on this server. I will
switch to latest main line and try.
Thanks,
Srinivas
8<--------------------
--- a/drivers/thermal/x86_pkg_temp_thermal.c
+++ b/drivers/thermal/x86_pkg_temp_thermal.c
@@ -63,6 +63,7 @@ struct pkg_device {
u32 msr_pkg_therm_high;
struct delayed_work work;
struct thermal_zone_device *tzone;
+ struct cpumask cpumask;
};
static struct thermal_zone_params pkg_temp_tz_params = {
@@ -391,6 +392,7 @@ static int pkg_temp_thermal_device_add(u
rdmsr(MSR_IA32_PACKAGE_THERM_INTERRUPT, pkgdev-
>msr_pkg_therm_low,
pkgdev->msr_pkg_therm_high);
+ cpumask_set_cpu(cpu, &pkgdev->cpumask);
spin_lock_irq(&pkg_temp_lock);
packages[pkgid] = pkgdev;
spin_unlock_irq(&pkg_temp_lock);
@@ -399,13 +401,15 @@ static int pkg_temp_thermal_device_add(u
static int pkg_thermal_cpu_offline(unsigned int cpu)
{
- int target = cpumask_any_but(topology_core_cpumask(cpu), cpu);
struct pkg_device *pkgdev = pkg_temp_thermal_get_dev(cpu);
bool lastcpu, was_target;
+ int target;
if (!pkgdev)
return 0;
+ target = cpumask_any_but(&pkgdev->cpumask, cpu);
+ cpumask_clear_cpu(cpu, &pkgdev->cpumask);
lastcpu = target >= nr_cpu_ids;
/*
@@ -492,8 +496,10 @@ static int pkg_thermal_cpu_online(unsign
return -ENODEV;
/* If the package exists, nothing to do */
- if (pkgdev)
+ if (pkgdev) {
+ cpumask_set_cpu(cpu, &pkgdev->cpumask);
return 0;
+ }
return pkg_temp_thermal_device_add(cpu);
}
Powered by blists - more mailing lists