[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <585bb5d3ee5bea063795682108576c3464ba72b6.camel@linux.intel.com>
Date: Mon, 30 Nov 2020 10:13:08 -0800
From: Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>
To: Kai-Heng Feng <kai.heng.feng@...onical.com>
Cc: Daniel Lezcano <daniel.lezcano@...aro.org>,
Zhang Rui <rui.zhang@...el.com>, amitk@...nel.org,
"open list:THERMAL" <linux-pm@...r.kernel.org>,
open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/3] thermal: core: Add indication for userspace usage
On Tue, 2020-12-01 at 02:04 +0800, Kai-Heng Feng wrote:
> > On Dec 1, 2020, at 00:19, Srinivas Pandruvada <
> > srinivas.pandruvada@...ux.intel.com> wrote:
> >
> > On Mon, 2020-11-30 at 16:23 +0800, Kai-Heng Feng wrote:
> > > > On Nov 30, 2020, at 15:57, Daniel Lezcano <
> > > > daniel.lezcano@...aro.org> wrote:
> > > >
> > > >
> > > > [Added Srinivas]
> > > >
> > > > On 28/11/2020 18:54, Kai-Heng Feng wrote:
> > > > > We are seeing thermal shutdown on Intel based mobile
> > > > > workstations, the
> > > > > shutdown happens during the first trip handle in
> > > > > thermal_zone_device_register():
> > > > > kernel: thermal thermal_zone15: critical temperature reached
> > > > > (101
> > > > > C), shutting down
> > > > >
> > > > > However, we shouldn't do a thermal shutdown here, since
> > > > > 1) We may want to use a dedicated daemon, Intel's thermald in
> > > > > this case,
> > > > > to handle thermal shutdown.
> > > > >
> > > > > 2) For ACPI based system, _CRT doesn't mean shutdown unless
> > > > > it's
> > > > > inside
> > > > > ThermalZone. ACPI Spec, 11.4.4 _CRT (Critical Temperature):
> > > > > "... If this object it present under a device, the device’s
> > > > > driver
> > > > > evaluates this object to determine the device’s critical
> > > > > cooling
> > > > > temperature trip point. This value may then be used by the
> > > > > device’s
> > > > > driver to program an internal device temperature sensor trip
> > > > > point."
> > > > >
> > > > > So a "critical trip" here merely means we should take a more
> > > > > aggressive
> > > > > cooling method.
> > > >
> > > > Well, actually it is stated before:
> > > >
> > > > "This object, when defined under a thermal zone, returns the
> > > > critical
> > > > temperature at which OSPM must shutdown the system".
> > >
> > > This means specifically for the ACPI ThermalZone in AML, e.g.:
> > >
> > > ThermalZone (TZ0) {
> > > ....
> > > Method(_CRT) { ... }
> > > } // end of TZ0
> > >
> > > However the device is not under any ACPI ThermalZone.
> > >
> > > > That is what does the thermal subsystem, no ?
> > > >
> > > > > So add an indication to let thermal core know it should leave
> > > > > thermal
> > > > > device to userspace to handle.
> > > >
> > > > You may want to check the 'HOT' trip point and then use the
> > > > notification
> > > > mechanism to get notified in userspace and take action from
> > > > there
> > > > (eg.
> > > > offline some CPUs).
> > >
> > > For this particular issue we are facing, the thermal shutdown
> > > happens
> > > in thermal_zone_device_register() and userspace isn't up yet.
> >
> > What about creating an new callback
> >
> > enum thermal_trip_status {
> > THERMAL_TRIP_DISABLED = 0,
> > THERMAL_TRIP_ENABLED,
> > };
> >
> > int get_trip_status(struct thermal_zone_device *, int trip, enum
> > thermal_trip_status *state);
> >
> > Then in
> > static void handle_thermal_trip(struct thermal_zone_device *tz, int
> > trip)
> > {
> >
> > /* before tz->ops->get_trip_temp(tz, trip, &trip_temp); */
> > if (tz->ops->get_trip_status) {
> > enum thermal_trip_status *status;
> >
> > if (!tz->ops->get_trip_status(tz, trip, &status)) {
> > if (status == THERMAL_TRIP_DISABLED)
> > return;
> > }
> > }
> > ...
> > ...
> >
> > }
> >
> >
> > This callback will help the cases:
> > - Allows drivers to selectively disable certain trips during init
> > state
> > or system resume where there can be spikes or always. int340x
> > drivers
> > can disable always.
>
> This sounds really great. This is indeed can happen on system resume,
> before userspace process thaw.
>
> > - Still give options for drivers to handle critical trip even if
> > they
> > are bound to user space governors. User space process may be dead,
> > so
> > still allow kernel to process graceful shutdown
>
> To make the scenario happen, do we need a new sysfs to let usespace
> enable it with THERMAL_TRIP_ENABLED?
This should be drivers call not user space.
Thanks,
Srinivas
>
> Kai-Heng
>
> > Thanks,
> > Srinivas
> >
> > > Kai-Heng
> > >
> > > > > Signed-off-by: Kai-Heng Feng <kai.heng.feng@...onical.com>
> > > > > ---
> > > > > drivers/thermal/thermal_core.c | 3 +++
> > > > > include/linux/thermal.h | 2 ++
> > > > > 2 files changed, 5 insertions(+)
> > > > >
> > > > > diff --git a/drivers/thermal/thermal_core.c
> > > > > b/drivers/thermal/thermal_core.c
> > > > > index c6d74bc1c90b..6561e3767529 100644
> > > > > --- a/drivers/thermal/thermal_core.c
> > > > > +++ b/drivers/thermal/thermal_core.c
> > > > > @@ -1477,6 +1477,9 @@ thermal_zone_device_register(const char
> > > > > *type, int trips, int mask,
> > > > > goto unregister;
> > > > > }
> > > > >
> > > > > + if (tz->tzp && tz->tzp->userspace)
> > > > > + thermal_zone_device_disable(tz);
> > > > > +
> > > > > mutex_lock(&thermal_list_lock);
> > > > > list_add_tail(&tz->node, &thermal_tz_list);
> > > > > mutex_unlock(&thermal_list_lock);
> > > > > diff --git a/include/linux/thermal.h
> > > > > b/include/linux/thermal.h
> > > > > index d07ea27e72a9..e8e8fac78fc8 100644
> > > > > --- a/include/linux/thermal.h
> > > > > +++ b/include/linux/thermal.h
> > > > > @@ -247,6 +247,8 @@ struct thermal_zone_params {
> > > > > */
> > > > > bool no_hwmon;
> > > > >
> > > > > + bool userspace;
> > > > > +
> > > > > int num_tbps; /* Number of tbp entries */
> > > > > struct thermal_bind_params *tbp;
> > > > >
> > > > >
> > > >
> > > > --
> > > > <http://www.linaro.org/> Linaro.org │ Open source software for
> > > > ARM
> > > > SoCs
> > > >
> > > > Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook
> > > > |
> > > > <http://twitter.com/#!/linaroorg> Twitter |
> > > > <http://www.linaro.org/linaro-blog/> Blog
Powered by blists - more mailing lists