lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 20 Oct 2022 04:51:17 -0400 From: Joel Fernandes <joel@...lfernandes.org> To: Qais Yousef <qyousef@...alina.io> Cc: Juri Lelli <juri.lelli@...hat.com>, Connor O'Brien <connoro@...gle.com>, linux-kernel@...r.kernel.org, kernel-team@...roid.com, John Stultz <jstultz@...gle.com>, Qais Yousef <qais.yousef@....com>, Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Vincent Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, Daniel Bristot de Oliveira <bristot@...hat.com>, Valentin Schneider <vschneid@...hat.com>, Will Deacon <will@...nel.org>, Waiman Long <longman@...hat.com>, Boqun Feng <boqun.feng@...il.com>, "Paul E . McKenney" <paulmck@...nel.org> Subject: Re: [RFC PATCH 00/11] Reviving the Proxy Execution Series > On Oct 19, 2022, at 3:30 PM, Qais Yousef <qyousef@...alina.io> wrote: > > On 10/19/22 15:41, Juri Lelli wrote: >>> On 19/10/22 08:23, Joel Fernandes wrote: >>> >>> >>>> On Oct 19, 2022, at 7:43 AM, Qais Yousef <qyousef@...alina.io> wrote: >>>> >>>> On 10/17/22 02:23, Joel Fernandes wrote: >>>> >>>>> I ran a test to check CFS time sharing. The accounting on top is confusing, >>>>> but ftrace confirms the proxying happening. >>>>> >>>>> Task A - pid 122 >>>>> Task B - pid 123 >>>>> Task C - pid 121 >>>>> Task D - pid 124 >>>>> >>>>> Here D and B just spin all the time. C is lock owner (in-kernel mutex) and >>>>> spins all the time, while A blocks on the same in-kernel mutex and remains >>>>> blocked. >>>>> >>>>> Then I did "top -H" while the test was running which gives below output. >>>>> The first column is PID, and the third-last column is CPU percentage. >>>>> >>>>> Without PE: >>>>> 121 root 20 0 99496 4 0 R 33.6 0.0 0:02.76 t (task C) >>>>> 123 root 20 0 99496 4 0 R 33.2 0.0 0:02.75 t (task B) >>>>> 124 root 20 0 99496 4 0 R 33.2 0.0 0:02.75 t (task D) >>>>> >>>>> With PE: >>>>> PID >>>>> 122 root 20 0 99496 4 0 D 25.3 0.0 0:22.21 t (task A) >>>>> 121 root 20 0 99496 4 0 R 25.0 0.0 0:22.20 t (task C) >>>>> 123 root 20 0 99496 4 0 R 25.0 0.0 0:22.20 t (task B) >>>>> 124 root 20 0 99496 4 0 R 25.0 0.0 0:22.20 t (task D) >>>>> >>>>> With PE, I was expecting 2 threads with 25% and 1 thread with 50%. Instead I >>>>> get 4 threads with 25% in the top. Ftrace confirms that the D-state task is >>>>> in fact not running and proxying to the owner task so everything seems >>>>> working correctly, but the accounting seems confusing, as in, it is confusing >>>>> to see the D-state task task taking 25% CPU when it is obviously "sleeping". >>>>> >>>>> Yeah, yeah, I know D is proxying for C (while being in the uninterruptible >>>>> sleep state), so may be it is OK then, but I did want to bring this up :-) >>>> >>>> I seem to remember Valentin raised similar issue about how userspace view can >>>> get confusing/misleading: >>>> >>>> https://www.youtube.com/watch?v=UQNOT20aCEg&t=3h21m41s >>> >>> Thanks for the pointer! Glad to see the consensus was that this is not >>> acceptable. >>> >>> I think we ought to write a patch to fix the accounting, for this >>> series. I propose adding 2 new entries to proc/pid/stat which I think >>> Juri was also sort of was alluding to: >>> >>> 1. Donated time. >>> 2. Proxied time. >> >> Sounds like a useful addition, at least from a debugging point of view. > > They look useful addition to me too. Thanks. >>> User space can then add or subtract this, to calculate things >>> correctly. Or just display them in new columns. I think it will also >>> actually show how much the proxying is happening for a use case. >> >> Guess we'll however need to be backward compatible with old userspace? >> Probably reporting the owner as running while proxied (as in the >> comparison case vs. rtmutexes Valentin showed). >> > > Or invent a new task_state? Doesn't have to be a real one, just report a new > letter for tasks in PE state. We could use 'r' to indicate running BUT.. This is a good idea, especially for tracing. I still feel the time taken in the state is also important to add so that top displays percentage properly. Best, -J > Cheers > > -- > Qais Yousef
Powered by blists - more mailing lists