lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20240130235157.636ee1183e073cfba8d7d0b7@kernel.org>
Date: Tue, 30 Jan 2024 23:51:57 +0900
From: Masami Hiramatsu (Google) <mhiramat@...nel.org>
To: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: Brian Norris <briannorris@...omium.org>, Pavel Machek <pavel@....cz>,
 Len Brown <len.brown@...el.com>, Randy Dunlap <rdunlap@...radead.org>,
 suleiman@...gle.com, linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org
Subject: Re: [PATCH v7] PM: sleep: Expose last succeeded resumed timestamp
 in sysfs

Hi Rafael,

Thanks for the comment!

On Thu, 25 Jan 2024 21:19:07 +0100
"Rafael J. Wysocki" <rafael@...nel.org> wrote:

> On Thu, Jan 25, 2024 at 1:43 AM Masami Hiramatsu <mhiramat@...nel.org> wrote:
> >
> > On Mon, 22 Jan 2024 18:08:22 -0800
> > Brian Norris <briannorris@...omium.org> wrote:
> >
> > > On Fri, Jan 19, 2024 at 1:08 PM Rafael J. Wysocki <rafael@...nel.org> wrote:
> > > > On Wed, Jan 17, 2024 at 1:07 AM Masami Hiramatsu <mhiramat@...nel.org> wrote:
> > > > >
> > > > > Gently ping,
> > > > >
> > > > > I would like to know this is enough or I should add more info/update.
> > > >
> > > > I still am not sure what this is going to be useful for.
> > > >
> > > > Do you have a specific example?
> > >
> > > Since there seems to be some communication gap here, I'll give it a try.
> > >
> > > First, I'll paste the key phrase of its use case from the cover letter:
> > >
> > >   "we would like to know how long the resume processes are taken in kernel
> > >   and in user-space"
> > >
> > > This is a "system measurement" question, for use in tests (e.g., in a
> > > test lab for CI or for pre-release testing, where we suspend
> > > Chromebooks, wake them back up, and measure how long the wakeup took)
> > > or for user-reported metrics (e.g., similar statistics from real
> > > users' systems, if they've agreed to automatically report usage
> > > statistics, back to Google). We'd like to know how long it takes for a
> > > system to wake up, so we can detect when there are problems that lead
> > > to a slow system-resume experience. The user experience includes both
> > > time spent in the kernel and time spent after user space has thawed
> > > (and is spending time in potentially complex power and display manager
> > > stacks) before a Chromebook's display lights back up.
> >
> > Thanks Brian for explaining, this is correctly explained how we are
> > using this for measuring resume process duration.
> >
> > > If I understand the whole of Masami's work correctly, I believe we're
> > > taking "timestamps parsed out of dmesg" (or potentially out of ftrace,
> > > trace events, etc.) to measure the kernel side, plus "timestamp
> > > provided here in CLOCK_MONOTONIC" and "timestamp determined in our
> > > power/display managers" to measure user space.
> >
> > Yes, I decided to decouple the kernel and user space because the clock
> > subsystem is adjusted when resuming. So for the kernel, we will use
> > local clock (which is not exposed to user space), and use CLOCK_MONOTONIC
> > for the user space.
> 
> The problem with this split is that you cannot know how much time
> elapses between the "successful kernel resume time" and the time when
> user space gets to resume.

Hm, let me check why.

> 
> As of this patch, the kernel timestamp is taken when the kernel is
> about to thaw user space and some user space tasks may start running
> right away.

Yes. But note that this just indicates the time when the "kernel"
done the resuming process.

> 
> Some other tasks, however, will wait for what happens next in the
> kernel (because it is not done with resuming yet) and some of them
> will wait until explicitly asked to resume by the resume process IIUC.

Yeah, those will be just restarted and may wait for a signal to e.g.
re-draw, re-load etc. I think those are a part of user-space resuming.

> 
> Your results depend on which tasks participate in the "user
> experience", so to speak.  If they are the tasks that wait to be
> kicked by the resume process, the kernel timestamp taken as per the

"kicked by the resume process" means thawing the tasks??

> above is useless for them, because there is quite some stuff that
> happens in the kernel before they will get kicked.

So your point might be, even after the timestamp, some kernel resume
process will affect? Since the system(kernel+user) resuming state is
continuously changing in parallel, it is hard to split the kernel
and user resume process?

> 
> Moreover, some tasks will wait for certain device drivers to get ready
> after the rest of the system resumes and that may still take some more
> time after the kernel has returned to the process driving the system
> suspend-resume.

Oh, I thought driver resume will be done in the kernel before thawing.
Of course some wifi/bt etc. will need to be re-connect again. And
I don't care at this moment.

> 
> I'm not sure if there is a single point which can be used as a "user
> space resume start" time for every task, which is why I'm not
> convinced about this patch.

Ah, OK. Yeah, that is a good point. I also think that is not useful
for every cases, but we need some reference timestamp when we start
resuming. And I think this is a better point to do. Of course if we
observe some delay, we still need more comprehensive tracing not only
application log, but this is the first step and I think the step we
can not avoid (because we don't expose the local clock to user space)

> 
> BTW, there is a utility called sleepgraph that measures the kernel
> part of the system suspend-resume.  It does its best to measure it
> very precisely and uses different techniques for that.  Also, it is
> included in the kernel source tree.  Can you please have a look at it
> and see how much there is in common between it and your tools?  Maybe
> there are some interfaces that can be used in common, or maybe it
> could benefit from some interfaces that you are planning to add.

Thanks for the info!  Yeah we also have some tracing tools (perfetto,
trace-cmd etc.) for analysis. Note that this timestamp will kick those
analysis tools if we find any delay based on the timestamp. I mean this
will be used for "monitoring" the trend but not for precise "analysis".

Thank you,

-- 
Masami Hiramatsu (Google) <mhiramat@...nel.org>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ