[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEXW_YRAXUmc0zMTo8OowVk2ZNuYfBsvZDXMkVH35aWkpEUo2A@mail.gmail.com>
Date: Wed, 19 Oct 2022 09:51:24 -0400
From: Joel Fernandes <joel@...lfernandes.org>
To: Juri Lelli <juri.lelli@...hat.com>
Cc: Qais Yousef <qyousef@...alina.io>,
"Connor O'Brien" <connoro@...gle.com>,
linux-kernel@...r.kernel.org, kernel-team@...roid.com,
John Stultz <jstultz@...gle.com>,
Qais Yousef <qais.yousef@....com>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Valentin Schneider <vschneid@...hat.com>,
Will Deacon <will@...nel.org>,
Waiman Long <longman@...hat.com>,
Boqun Feng <boqun.feng@...il.com>,
"Paul E . McKenney" <paulmck@...nel.org>
Subject: Re: [RFC PATCH 00/11] Reviving the Proxy Execution Series
On Wed, Oct 19, 2022 at 9:41 AM Juri Lelli <juri.lelli@...hat.com> wrote:
>
> On 19/10/22 08:23, Joel Fernandes wrote:
> >
> >
> > > On Oct 19, 2022, at 7:43 AM, Qais Yousef <qyousef@...alina.io> wrote:
> > >
> > > On 10/17/22 02:23, Joel Fernandes wrote:
> > >
> > >> I ran a test to check CFS time sharing. The accounting on top is confusing,
> > >> but ftrace confirms the proxying happening.
> > >>
> > >> Task A - pid 122
> > >> Task B - pid 123
> > >> Task C - pid 121
> > >> Task D - pid 124
> > >>
> > >> Here D and B just spin all the time. C is lock owner (in-kernel mutex) and
> > >> spins all the time, while A blocks on the same in-kernel mutex and remains
> > >> blocked.
> > >>
> > >> Then I did "top -H" while the test was running which gives below output.
> > >> The first column is PID, and the third-last column is CPU percentage.
> > >>
> > >> Without PE:
> > >> 121 root 20 0 99496 4 0 R 33.6 0.0 0:02.76 t (task C)
> > >> 123 root 20 0 99496 4 0 R 33.2 0.0 0:02.75 t (task B)
> > >> 124 root 20 0 99496 4 0 R 33.2 0.0 0:02.75 t (task D)
> > >>
> > >> With PE:
> > >> PID
> > >> 122 root 20 0 99496 4 0 D 25.3 0.0 0:22.21 t (task A)
> > >> 121 root 20 0 99496 4 0 R 25.0 0.0 0:22.20 t (task C)
> > >> 123 root 20 0 99496 4 0 R 25.0 0.0 0:22.20 t (task B)
> > >> 124 root 20 0 99496 4 0 R 25.0 0.0 0:22.20 t (task D)
> > >>
> > >> With PE, I was expecting 2 threads with 25% and 1 thread with 50%. Instead I
> > >> get 4 threads with 25% in the top. Ftrace confirms that the D-state task is
> > >> in fact not running and proxying to the owner task so everything seems
> > >> working correctly, but the accounting seems confusing, as in, it is confusing
> > >> to see the D-state task task taking 25% CPU when it is obviously "sleeping".
> > >>
> > >> Yeah, yeah, I know D is proxying for C (while being in the uninterruptible
> > >> sleep state), so may be it is OK then, but I did want to bring this up :-)
> > >
> > > I seem to remember Valentin raised similar issue about how userspace view can
> > > get confusing/misleading:
> > >
> > > https://www.youtube.com/watch?v=UQNOT20aCEg&t=3h21m41s
> >
> > Thanks for the pointer! Glad to see the consensus was that this is not
> > acceptable.
> >
> > I think we ought to write a patch to fix the accounting, for this
> > series. I propose adding 2 new entries to proc/pid/stat which I think
> > Juri was also sort of was alluding to:
> >
> > 1. Donated time.
> > 2. Proxied time.
>
> Sounds like a useful addition, at least from a debugging point of view.
>
> > User space can then add or subtract this, to calculate things
> > correctly. Or just display them in new columns. I think it will also
> > actually show how much the proxying is happening for a use case.
>
> Guess we'll however need to be backward compatible with old userspace?
> Probably reporting the owner as running while proxied (as in the
> comparison case vs. rtmutexes Valentin showed).
Hi Juri,
Yes I was thinking of leaving the old metrics alone and just providing
the new ones as additional fields in /proc/pid/stats . Then the tools
adjust as needed with the new information. From kernel PoV we provide
the maximum information.
Thanks,
- Joel
Powered by blists - more mailing lists