[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cb5ab68e-5034-d937-e28e-e838e50172a8@amd.com>
Date: Mon, 31 Oct 2022 20:58:27 -0500
From: Mario Limonciello <mario.limonciello@....com>
To: Sven van Ashbrook <svenva@...omium.org>
Cc: Rajneesh Bhardwaj <irenic.rajneesh@...il.com>,
Hans de Goede <hdegoede@...hat.com>,
LKML <linux-kernel@...r.kernel.org>,
"S-k, Shyam-sundar" <Shyam-sundar.S-k@....com>,
"rrangel@...omium.org" <rrangel@...omium.org>,
"platform-driver-x86@...r.kernel.org"
<platform-driver-x86@...r.kernel.org>,
Rajneesh Bhardwaj <rajneesh.bhardwaj@...el.com>,
Rafael J Wysocki <rjw@...ysocki.net>,
Rajat Jain <rajatja@...gle.com>,
David E Box <david.e.box@...el.com>,
Mark Gross <markgross@...nel.org>
Subject: Re: [PATCH v1] platform/x86: intel_pmc_core: promote S0ix failure
warn() to WARN()
On 10/31/22 20:38, Sven van Ashbrook wrote:
> On Mon, Oct 31, 2022 at 3:39 PM Limonciello, Mario
> <Mario.Limonciello@....com> wrote:
>>
>> Just thinking about it a little bit more, it could be a lot nicer to have something like:
>>
>> /sys/power/suspend_stats/last_hw_deepest_state
>
> While I agree that reporting through a framework is generally better
> than getting infrastructure to grep for specific strings, I believe
> that a simple sysfs file is probably too simplistic.
>
> 1. We need more sophisticated reporting than just last_hw_deepest_state:
>
> - sometimes the system enters the deep state we want, yet after a
> while moves back up and gets "stuck" in an intermediate state (below
> S0). Or, the system enters the deep state we want, but moves back to
> S0 after a time without apparent reason. These platform-dependent
> failures are not so easily describable in a generic framework.
I actually thought that by putting the duration of time put in
last_hw_deepest_state you'll be able to catch this by comparing the
duration of the suspend to the duration of last_hw_deepest_state.
If you're below some threshold of percent for suspends that are at least
some other threshold long you can trigger the failure.
This then lets you tune your framework to find the right place for those
thresholds too without needing to change the kernel.
>
> - ChromeOS in particular has multiple independent S0ix / S3 / s2idle
> failure report sources. We have the kernel warning above; also our
> Embedded Controller monitors suspend failure cases which the simple
> kernel warning cannot catch, reported through a separate WARN_ONCE().
> > 2. A simple sysfs file will need to be polled by the infrastructure
> after every suspend; it would be preferable to have some signal or
> callback which the infrastructure could register itself with.
The interface to trigger a suspend is writing a value into
/sys/power/state. You'll get a return code from this, but this return
code does not represent whether you got to the deepest state, just
whether the suspend succeeded or not.
So what would an ideal interface that sends a signal that the last
"successful" suspend didn't get to the deepest state look like to you?
>
> The generic infrastructure to support this sounds like quite a bit of
> work, and for what gain? Compared to simply matching a log string and
> sending the whole dmesg if there's a match.
I would like to think it's cheaper to read the sysfs file, do a local
comparison on HW deepest time to the suspend time and then only send the
the dmesg up for further analysis.
>
> Is the light worth the candle?
I wrote an RFC that I sent out for it with my ideas at least.
Powered by blists - more mailing lists