[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170411113645.j5gvvad3vczvyvlo@hirez.programming.kicks-ass.net>
Date: Tue, 11 Apr 2017 13:36:45 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Wanpeng Li <kernellwp@...il.com>
Cc: Frederic Weisbecker <fweisbec@...il.com>,
Mike Galbraith <efault@....de>, Rik van Riel <riel@...hat.com>,
Luiz Capitulino <lcapitulino@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [BUG nohz]: wrong user and system time accounting
On Tue, Apr 11, 2017 at 07:03:17PM +0800, Wanpeng Li wrote:
> 2017-03-30 21:38 GMT+08:00 Frederic Weisbecker <fweisbec@...il.com>:
> > On Thu, Mar 30, 2017 at 02:47:11PM +0800, Wanpeng Li wrote:
>
> [...]
>
> >
> >>
> >> -------------------------------------->8-----------------------------------------------------
> >>
> >> use nanosecond granularity to check deltas but only perform an actual
> >> cputime update when that delta >= TICK_NSEC.
> >>
> >> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> >> index f3778e2b..f1ee393 100644
> >> --- a/kernel/sched/cputime.c
> >> +++ b/kernel/sched/cputime.c
> >> @@ -676,18 +676,21 @@ void thread_group_cputime_adjusted(struct
> >> task_struct *p, u64 *ut, u64 *st)
> >> #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
> >> static u64 vtime_delta(struct task_struct *tsk)
> >> {
> >> - unsigned long now = READ_ONCE(jiffies);
> >> + u64 now = local_clock();
> >
> > I fear we need a global clock, because the reader (task_cputime()) needs
> > to compute the delta and therefore use the same clock from any CPU.
> >
> > Or we can use the local_clock() but the reader must access the same.
> >
> > So there would be vtime_delta_writer() which uses local_clock and stores
> > the current CPU to tsk->vtime_cpu (under the vtime_seqcount). And then
> > vtime_delta_reader() which calls sched_clock_cpu(tsk->vtime_cpu) which
> > is protected by vtime_seqcount as well.
> >
> > Although those sched_clock_cpu() things seem to only matter when the
> > sched_clock() is unstable. And that stability is a condition for nohz_full
> > to work anyway. So probably sched_clock() alone would be enough.
>
> I observed ~60% user time and ~40% sys time when replace local_clock()
> above by sched_clock()(two cpu hogs on the cpu in nohz_full mode). In
> addition, Luiz's testcast ./acct-bug 1 995 will show 100% idle time.
> If keep local_clock() in vtime_delta(), cpu hogs testcase will
> success. However, Luiz's testcase still show 100% idle time.
Assuming a stable TSC, there should be no difference between
local_clock() and sched_clock().
Powered by blists - more mailing lists