[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aXjPiZCHZ77R4awi@localhost.localdomain>
Date: Tue, 27 Jan 2026 15:45:29 +0100
From: Frederic Weisbecker <frederic@...nel.org>
To: Heiko Carstens <hca@...ux.ibm.com>
Cc: LKML <linux-kernel@...r.kernel.org>,
"Christophe Leroy (CS GROUP)" <chleroy@...nel.org>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Alexander Gordeev <agordeev@...ux.ibm.com>,
Anna-Maria Behnsen <anna-maria@...utronix.de>,
Ben Segall <bsegall@...gle.com>, Boqun Feng <boqun.feng@...il.com>,
Christian Borntraeger <borntraeger@...ux.ibm.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Ingo Molnar <mingo@...hat.com>, Jan Kiszka <jan.kiszka@...mens.com>,
Joel Fernandes <joelagnelf@...dia.com>,
Juri Lelli <juri.lelli@...hat.com>,
Kieran Bingham <kbingham@...nel.org>,
Madhavan Srinivasan <maddy@...ux.ibm.com>,
Mel Gorman <mgorman@...e.de>, Michael Ellerman <mpe@...erman.id.au>,
Neeraj Upadhyay <neeraj.upadhyay@...nel.org>,
Nicholas Piggin <npiggin@...il.com>,
"Paul E . McKenney" <paulmck@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Steven Rostedt <rostedt@...dmis.org>,
Sven Schnelle <svens@...ux.ibm.com>,
Thomas Gleixner <tglx@...utronix.de>,
Uladzislau Rezki <urezki@...il.com>,
Valentin Schneider <vschneid@...hat.com>,
Vasily Gorbik <gor@...ux.ibm.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Viresh Kumar <viresh.kumar@...aro.org>,
Xin Zhao <jackzxcui1989@....com>, linux-pm@...r.kernel.org,
linux-s390@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org
Subject: Re: [PATCH 05/15] s390/time: Prepare to stop elapsing in
dynticks-idle
Le Thu, Jan 22, 2026 at 03:40:45PM +0100, Heiko Carstens a écrit :
> On Wed, Jan 21, 2026 at 07:04:35PM +0100, Frederic Weisbecker wrote:
> > BTW here is a question for you, does the timer (as in get_cpu_timer()) still
> > decrements while in idle? I would assume not, given how lc->system_timer
> > is updated in account_idle_time_irq().
>
> It is not decremented while in idle (or when the hypervisor schedules
> the virtual cpu away). We use the fact that the cpu timer is not
> decremented when the virtual cpu is not running vs the real
> time-of-day clock to calculate steal time.
Ok, good then!
>
> > And another question in this same function is this :
> >
> > lc->steal_timer += idle->clock_idle_enter - lc->last_update_clock;
> >
> > clock_idle_enter is updated right before halting the CPU. But when was
> > last_update_clock updated last? Could be either task switch to idle, or
> > a previous idle tick interrupt or a previous idle IRQ entry. In any case
> > I'm not sure the difference is meaningful as steal time.
> >
> > I must be missing something.
>
> "It has been like that forever" :) However I do agree that this doesn't seem
> to make any sense. At least with the current implementation I cannot see how
> that makes sense, since the difference of two time stamps, which do not
> include any steal time are added.
>
> Maybe it broke by some of all the changes over the years, or it was always
> wrong, or I am missing something too.
>
> Will investigate and address it if required. Thank you for bringing this up!
Ok, I take some relief from the fact it's not only unclear to me :-)
>
> > > Not sure what to do with this. I thought about removing those sysfs files
> > > already in the past, since they are of very limited use; and most likely
> > > nothing in user space would miss them.
> >
> > Perhaps but this file is a good comparison point against /proc/stat because
> > s390 vtime is much closer to measuring the actual CPU halted time than what
> > the generic nohz accounting does (which includes more idle code execution).
>
> Yes, while comparing those files I also see an unexpected difference of
> several seconds after two days of uptime; that is before your changes.
>
> In theory the sum of idle and iowait in /proc/stat should be the same like the
> per-cpu idle_time_us sysfs file. But there is a difference, which shouldn't be
> there as far as I can tell. Yet another thing to look into.
Yes and that's expected both before and after my changes.
* /proc/stat is the time spent between tick_nohz_idle_enter() and
tick_nohz_idle_exit() (to simplify, because there are some pause during
idle IRQs).
* The s390 idle sysfs file depicts more closely the time spent while the
CPU is really idle (and not executing idle code).
Different semantics and this is why you observe different results. I guess
/proc/stat has higher values (with idle + iowait) and that is expected.
Thanks.
--
Frederic Weisbecker
SUSE Labs
Powered by blists - more mailing lists