lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1490636129.8850.76.camel@redhat.com>
Date:   Mon, 27 Mar 2017 13:35:29 -0400
From:   Rik van Riel <riel@...hat.com>
To:     Wanpeng Li <kernellwp@...il.com>,
        Luiz Capitulino <lcapitulino@...hat.com>
Cc:     Frederic Weisbecker <fweisbec@...il.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        linux-rt-users@...r.kernel.org
Subject: Re: [BUG nohz]: wrong user and system time accounting

On Mon, 2017-03-27 at 09:56 +0800, Wanpeng Li wrote:
> 
> Actually after I bisect, the first bad commit is ff9a9b4c4334
> ("sched,
> time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity"). The bug
> can be reproduced readily if CONFIG_CONTEXT_TRACKING_FORCE is true

At the time, we thought it was an "occasionally bad" / "unlucky"
kind of bug, not a systemic issue, like your observations seem
to suggest.

> Let's consider the cpu which has responsibility for the global
> timekeeping, as the tracing posted above, the vtime_account_user() is
> called before tick_sched_timer() which will update jiffies, so
> jiffies
> is stale in vtime_account_user() and the run time in userspace is
> skipped, the vtime_user_enter() is called after jiffies update, so
> both the time in userspace and in  kernel are accumulated to sys
> time.
> If the housekeeping cpu is idle when CONFIG_NO_HZ_FULL, everything is
> fine. However, if you give stress to the housekeeping cpu, top will
> show 100% sys-time of both the housekeeping cpu and the other cpus
> who
> have at least two tasks running on and in full_nohz mode. I think it
> is because the stress delays the timer interrupt handling in some
> degree, then the jiffies is not updated timely before other cpus
> access it in vtime_account_user().
> 
> I think we can keep syscalls/exceptions context tracking still in
> jiffies based sampling and utilize local_clock() in vtime_delta()
> again for irqs which avoids jiffies stale influence. I can make a
> patch if the idea is acceptable or there is any better proposal. :)

Making that patch seems worthwhile, but I would like to
know what the root cause is of the issue that is being
observed.

Is the problem due to the nohz_full CPU receiving an
interrupt at the same time the timer interrupt fires on
the housekeeping CPU?

Is it due to a nohz_full CPU updating jiffies all by
itself from irq context?  In that case, could it be
better to always have that be done by the housekeeping
CPU?

What exactly is going on here?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ