lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 23 Aug 2016 16:33:30 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Rik van Riel <riel@...hat.com>
Cc:     Minchan Kim <minchan@...nel.org>,
        Sonny Rao <sonnyrao@...omium.org>, Jann Horn <jann@...jh.net>,
        Robert Foss <robert.foss@...labora.com>, corbet@....net,
        Andrew Morton <akpm@...ux-foundation.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Konstantin Khlebnikov <koct9i@...il.com>,
        Hugh Dickins <hughd@...gle.com>,
        Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
        John Stultz <john.stultz@...aro.org>,
        ross.zwisler@...ux.intel.com, jmarchan@...hat.com,
        Johannes Weiner <hannes@...xchg.org>,
        Kees Cook <keescook@...omium.org>,
        Al Viro <viro@...iv.linux.org.uk>,
        Cyrill Gorcunov <gorcunov@...nvz.org>,
        Robin Humble <plaguedbypenguins@...il.com>,
        David Rientjes <rientjes@...gle.com>,
        eric.engestrom@...tec.com, Janis Danisevskis <jdanis@...gle.com>,
        calvinowens@...com, Alexey Dobriyan <adobriyan@...il.com>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        ldufour@...ux.vnet.ibm.com, linux-doc@...r.kernel.org,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Ben Zhang <benzh@...omium.org>,
        Bryan Freed <bfreed@...omium.org>,
        Filipe Brandenburger <filbranden@...omium.org>,
        Mateusz Guzik <mguzik@...hat.com>
Subject: utime accounting regression since 4.6 (was: Re: [PACTH v2 0/3]
 Implement /proc/<pid>/totmaps)

On Tue 23-08-16 10:26:03, Michal Hocko wrote:
> On Mon 22-08-16 19:47:09, Michal Hocko wrote:
> > On Mon 22-08-16 19:29:36, Michal Hocko wrote:
> > > On Mon 22-08-16 18:45:54, Michal Hocko wrote:
> > > [...]
> > > > I have no idea why those numbers are so different on my laptop
> > > > yet. It surely looks suspicious. I will try to debug this further
> > > > tomorrow.
> > > 
> > > Hmm, so I've tried to use my version of awk on other machine and vice
> > > versa and it didn't make any difference. So this is independent on the
> > > awk version it seems. So I've tried to strace /usr/bin/time and
> > > wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, {ru_utime={0, 0}, ru_stime={0, 688438}, ...}) = 9128
> > > 
> > > so the kernel indeed reports 0 user time for some reason. Note I
> > > was testing with 4.7 and right now with 4.8.0-rc3 kernel (no local
> > > modifications). The other machine which reports non-0 utime is 3.12
> > > SLES kernel. Maybe I am hitting some accounting bug. At first I was
> > > suspecting CONFIG_NO_HZ_FULL because that is the main difference between
> > > my and the other machine but then I've noticed that the tests I was
> > > doing in kvm have this disabled too.. so it must be something else.
> > 
> > 4.5 reports non-0 while 4.6 zero utime. NO_HZ configuration is the same
> > in both kernels.
> 
> and one more thing. It is not like utime accounting would be completely
> broken and always report 0. Other commands report non-0 values even on
> 4.6 kernels. I will try to bisect this down later today.

OK, so it seems I found it. I was quite lucky because account_user_time
is not all that popular function and there were basically no changes
besides Riks ff9a9b4c4334 ("sched, time: Switch VIRT_CPU_ACCOUNTING_GEN
to jiffy granularity") and that seems to cause the regression. Reverting
the commit on top of the current mmotm seems to fix the issue for me.

And just to give Rik more context. While debugging overhead of the
/proc/<pid>/smaps I am getting a misleading output from /usr/bin/time -v
(source for ./max_mmap is [1])

root@...t1:~# uname -r
4.5.0-rc6-bisect1-00025-gff9a9b4c4334
root@...t1:~# ./max_map 
pid:2990 maps:65515
root@...t1:~# /usr/bin/time -v awk '/^Rss/{rss+=$2} /^Pss/{pss+=$2} END {printf "rss:%d pss:%d\n", rss, pss}' /proc/2990/smaps
rss:263368 pss:262203
        Command being timed: "awk /^Rss/{rss+=$2} /^Pss/{pss+=$2} END {printf "rss:%d pss:%d\n", rss, pss} /proc/2990/smaps"
        User time (seconds): 0.00
        System time (seconds): 0.45
        Percent of CPU this job got: 98%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.46
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 1796
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 1
        Minor (reclaiming a frame) page faults: 83
        Voluntary context switches: 6
        Involuntary context switches: 6
        Swaps: 0
        File system inputs: 248
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

See the User time being 0 (as you can see above in the quoted text it
is not a rounding error in userspace or something similar because wait4
really returns 0). Now with the revert
root@...t1:~# uname -r
4.5.0-rc6-revert-00026-g7fc86f968bf5
root@...t1:~# /usr/bin/time -v awk '/^Rss/{rss+=$2} /^Pss/{pss+=$2} END {printf "rss:%d pss:%d\n", rss, pss}' /proc/3015/smaps
rss:263316 pss:262199
        Command being timed: "awk /^Rss/{rss+=$2} /^Pss/{pss+=$2} END {printf "rss:%d pss:%d\n", rss, pss} /proc/3015/smaps"
        User time (seconds): 0.18
        System time (seconds): 0.29
        Percent of CPU this job got: 97%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.50
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 1760
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 1
        Minor (reclaiming a frame) page faults: 79
        Voluntary context switches: 5
        Involuntary context switches: 7
        Swaps: 0
        File system inputs: 248
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

So it looks like the whole user time is accounted as the system time.
My config is attached and yes I do have CONFIG_VIRT_CPU_ACCOUNTING_GEN
enabled. Could you have a look please?

[1] http://lkml.kernel.org/r/20160817082200.GA10547@dhcp22.suse.cz
-- 
Michal Hocko
SUSE Labs

Download attachment ".config.gz" of type "application/gzip" (23016 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ