linux-kernel - Re: [BUG nohz]: wrong user and system time accounting

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1490636129.8850.76.camel@redhat.com>
Date:   Mon, 27 Mar 2017 13:35:29 -0400
From:   Rik van Riel <riel@...hat.com>
To:     Wanpeng Li <kernellwp@...il.com>,
        Luiz Capitulino <lcapitulino@...hat.com>
Cc:     Frederic Weisbecker <fweisbec@...il.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        linux-rt-users@...r.kernel.org
Subject: Re: [BUG nohz]: wrong user and system time accounting

On Mon, 2017-03-27 at 09:56 +0800, Wanpeng Li wrote:
> 
> Actually after I bisect, the first bad commit is ff9a9b4c4334
> ("sched,
> time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity"). The bug
> can be reproduced readily if CONFIG_CONTEXT_TRACKING_FORCE is true

At the time, we thought it was an "occasionally bad" / "unlucky"
kind of bug, not a systemic issue, like your observations seem
to suggest.

> Let's consider the cpu which has responsibility for the global
> timekeeping, as the tracing posted above, the vtime_account_user() is
> called before tick_sched_timer() which will update jiffies, so
> jiffies
> is stale in vtime_account_user() and the run time in userspace is
> skipped, the vtime_user_enter() is called after jiffies update, so
> both the time in userspace and in  kernel are accumulated to sys
> time.
> If the housekeeping cpu is idle when CONFIG_NO_HZ_FULL, everything is
> fine. However, if you give stress to the housekeeping cpu, top will
> show 100% sys-time of both the housekeeping cpu and the other cpus
> who
> have at least two tasks running on and in full_nohz mode. I think it
> is because the stress delays the timer interrupt handling in some
> degree, then the jiffies is not updated timely before other cpus
> access it in vtime_account_user().
> 
> I think we can keep syscalls/exceptions context tracking still in
> jiffies based sampling and utilize local_clock() in vtime_delta()
> again for irqs which avoids jiffies stale influence. I can make a
> patch if the idea is acceptable or there is any better proposal. :)

Making that patch seems worthwhile, but I would like to
know what the root cause is of the issue that is being
observed.

Is the problem due to the nohz_full CPU receiving an
interrupt at the same time the timer interrupt fires on
the housekeeping CPU?

Is it due to a nohz_full CPU updating jiffies all by
itself from irq context?  In that case, could it be
better to always have that be done by the housekeeping
CPU?

What exactly is going on here?