lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 29 Nov 2010 22:22:13 +0800
From:	Yong Zhang <yong.zhang0@...il.com>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	mingo@...hat.com, hpa@...or.com, linux-kernel@...r.kernel.org,
	tglx@...utronix.de, venki@...gle.com, mingo@...e.hu,
	linux-tip-commits@...r.kernel.org
Subject: Re: [tip:sched/core] sched: Do not account irq time to current task

On Mon, Nov 29, 2010 at 12:59:50PM +0100, Peter Zijlstra wrote:
> On Mon, 2010-11-29 at 16:45 +0800, Yong Zhang wrote:
> > On Tue, Oct 19, 2010 at 3:26 AM, tip-bot for Venkatesh Pallipadi
> > <venki@...gle.com> wrote:
> > > Commit-ID:  305e6835e05513406fa12820e40e4a8ecb63743c
> > > Gitweb:     http://git.kernel.org/tip/305e6835e05513406fa12820e40e4a8ecb63743c
> > > Author:     Venkatesh Pallipadi <venki@...gle.com>
> > > AuthorDate: Mon, 4 Oct 2010 17:03:21 -0700
> > > Committer:  Ingo Molnar <mingo@...e.hu>
> > > CommitDate: Mon, 18 Oct 2010 20:52:26 +0200
> > >
> > > sched: Do not account irq time to current task
> > >
> > > Scheduler accounts both softirq and interrupt processing times to the
> > > currently running task. This means, if the interrupt processing was
> > > for some other task in the system, then the current task ends up being
> > > penalized as it gets shorter runtime than otherwise.
> > >
> > > Change sched task accounting to acoount only actual task time from
> > > currently running task. Now update_curr(), modifies the delta_exec to
> > > depend on rq->clock_task.
> > >
> > > Note that this change only handles CONFIG_IRQ_TIME_ACCOUNTING case. We can
> > > extend this to CONFIG_VIRT_CPU_ACCOUNTING with minimal effort. But, thats
> > > for later.
> > >
> > > This change will impact scheduling behavior in interrupt heavy conditions.
> > >
> > > Tested on a 4-way system with eth0 handled by CPU 2 and a network heavy
> > > task (nc) running on CPU 3 (and no RSS/RFS). With that I have CPU 2
> > > spending 75%+ of its time in irq processing. CPU 3 spending around 35%
> > > time running nc task.
> > >
> > > Now, if I run another CPU intensive task on CPU 2, without this change
> > > /proc/<pid>/schedstat shows 100% of time accounted to this task. With this
> > > change, it rightly shows less than 25% accounted to this task as remaining
> > > time is actually spent on irq processing.
> > >
> > > Signed-off-by: Venkatesh Pallipadi <venki@...gle.com>
> > > Signed-off-by: Peter Zijlstra <a.p.zijlstra@...llo.nl>
> > > LKML-Reference: <1286237003-12406-7-git-send-email-venki@...gle.com>
> > > Signed-off-by: Ingo Molnar <mingo@...e.hu>
> > > ---
> > >  kernel/sched.c      |   43 ++++++++++++++++++++++++++++++++++++++++---
> > >  kernel/sched_fair.c |    6 +++---
> > >  kernel/sched_rt.c   |    8 ++++----
> > >  3 files changed, 47 insertions(+), 10 deletions(-)
> > >
> > 
> > [snip]
> > 
> > > diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
> > > index ab77aa0..bea7d79 100644
> > > --- a/kernel/sched_rt.c
> > > +++ b/kernel/sched_rt.c
> > > @@ -609,7 +609,7 @@ static void update_curr_rt(struct rq *rq)
> > >        if (!task_has_rt_policy(curr))
> > >                return;
> > >
> > > -       delta_exec = rq->clock - curr->se.exec_start;
> > > +       delta_exec = rq->clock_task - curr->se.exec_start;
> > >        if (unlikely((s64)delta_exec < 0))
> > >                delta_exec = 0;
> > >
> > > @@ -618,7 +618,7 @@ static void update_curr_rt(struct rq *rq)
> > >        curr->se.sum_exec_runtime += delta_exec;
> > >        account_group_exec_runtime(curr, delta_exec);
> > >
> > > -       curr->se.exec_start = rq->clock;
> > > +       curr->se.exec_start = rq->clock_task;
> > >        cpuacct_charge(curr, delta_exec);
> > >
> > >        sched_rt_avg_update(rq, delta_exec);
> > 
> > Seems the above changes to update_curr_rt() have some false positive
> > to rt_bandwidth control.
> > For example:
> >   rt_period=1000000;
> >   rt_runtime=950000;
> >   then if in that period the irq_time is no zero(such as 50000), according to
> >   the throttle mechanism, rt is not throttled. In the end we left no
> > time to others.
> >   It seems that this break the semantic of throttle.
> > 
> > Maybe we can revert the change to update_curr_rt()?
> 
> No, that's totally correct.
> 
> Its the correct and desired behaviour, IRQ time is not time spend
> running the RT tasks, hence they should not be accounted for it.

Right.

> 
> If you still want to throttle RT tasks simply ensure their bandwidth
> constraint is lower than the available time.

But the available time is harder to calculated than before.
IRQ is random, so as to the irq_time.
But the unthrottle(do_sched_rt_period_timer()) runs in fixed
period which is based on hard clock.

Is that what we want?

Thanks,
Yong
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ