lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 31 Jul 2018 11:32:38 +0800
From:   Wanpeng Li <kernellwp@...il.com>
To:     Vincent Guittot <vincent.guittot@...aro.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>, juri.lelli@...hat.com,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Morten Rasmussen <Morten.Rasmussen@....com>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        valentin.schneider@....com,
        Patrick Bellasi <patrick.bellasi@....com>,
        joel@...lfernandes.org, Daniel Lezcano <daniel.lezcano@...aro.org>,
        quentin.perret@....com, Luca Abeni <luca.abeni@...tannapisa.it>,
        claudio@...dence.eu.com, Ingo Molnar <mingo@...hat.com>,
        kvm <kvm@...r.kernel.org>
Subject: Re: [PATCH 06/11] sched/irq: add irq utilization tracking

On Tue, 31 Jul 2018 at 00:43, Vincent Guittot
<vincent.guittot@...aro.org> wrote:
>
> Hi Wanpeng,
>
> On Thu, 26 Jul 2018 at 05:09, Wanpeng Li <kernellwp@...il.com> wrote:
> >
> > Hi Vincent,
> > On Fri, 29 Jun 2018 at 03:07, Vincent Guittot
> > <vincent.guittot@...aro.org> wrote:
> > >
> > > interrupt and steal time are the only remaining activities tracked by
> > > rt_avg. Like for sched classes, we can use PELT to track their average
> > > utilization of the CPU. But unlike sched class, we don't track when
> > > entering/leaving interrupt; Instead, we take into account the time spent
> > > under interrupt context when we update rqs' clock (rq_clock_task).
> > > This also means that we have to decay the normal context time and account
> > > for interrupt time during the update.
> > >
> > > That's also important to note that because
> > >   rq_clock == rq_clock_task + interrupt time
> > > and rq_clock_task is used by a sched class to compute its utilization, the
> > > util_avg of a sched class only reflects the utilization of the time spent
> > > in normal context and not of the whole time of the CPU. The utilization of
> > > interrupt gives an more accurate level of utilization of CPU.
> > > The CPU utilization is :
> > >   avg_irq + (1 - avg_irq / max capacity) * /Sum avg_rq
> > >
> > > Most of the time, avg_irq is small and neglictible so the use of the
> > > approximation CPU utilization = /Sum avg_rq was enough
> > >
> > > Cc: Ingo Molnar <mingo@...hat.com>
> > > Cc: Peter Zijlstra <peterz@...radead.org>
> > > Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
> > > ---
> > >  kernel/sched/core.c  |  4 +++-
> > >  kernel/sched/fair.c  | 13 ++++++++++---
> > >  kernel/sched/pelt.c  | 40 ++++++++++++++++++++++++++++++++++++++++
> > >  kernel/sched/pelt.h  | 16 ++++++++++++++++
> > >  kernel/sched/sched.h |  3 +++
> > >  5 files changed, 72 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > > index 78d8fac..e5263a4 100644
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -18,6 +18,8 @@
> > >  #include "../workqueue_internal.h"
> > >  #include "../smpboot.h"
> > >
> > > +#include "pelt.h"
> > > +
> > >  #define CREATE_TRACE_POINTS
> > >  #include <trace/events/sched.h>
> > >
> > > @@ -186,7 +188,7 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
> > >
> > >  #if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING)
> > >         if ((irq_delta + steal) && sched_feat(NONTASK_CAPACITY))
> > > -               sched_rt_avg_update(rq, irq_delta + steal);
> > > +               update_irq_load_avg(rq, irq_delta + steal);
> >
> > I think we should not add steal time into irq load tracking, steal
> > time is always 0 on native kernel which doesn't matter, what will
> > happen when guest disables IRQ_TIME_ACCOUNTING and enables
> > PARAVIRT_TIME_ACCOUNTING? Steal time is not the real irq util_avg. In
> > addition, we haven't exposed power management for performance which
> > means that e.g. schedutil governor can not cooperate with passive mode
> > intel_pstate driver to tune the OPP. To decay the old steal time avg
> > and add the new one just wastes cpu cycles.
>
> In fact, I have kept the same behavior as with rt_avg, which was
> already adding steal time when computing scale_rt_capacity, which is
> used to reflect the remaining capacity for FAIR tasks and is used in
> load balance. I'm not sure that it's worth using different variables
> for irq and steal.
> That being said, I see a possible optimization in schedutil when
> PARAVIRT_TIME_ACCOUNTING is enable and IRQ_TIME_ACCOUNTING is disable.
> With this kind of config, scale_irq_capacity can be a nop for
> schedutil but scales the utilization for scale_rt_capacity

Yeah, this is what in my mind before, you can make a patch for that. :)

Regards,
Wanpeng Li

Powered by blists - more mailing lists