lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 18 Aug 2016 09:44:33 +0200
From:	Michal Hocko <mhocko@...nel.org>
To:	Sonny Rao <sonnyrao@...omium.org>
Cc:	Jann Horn <jann@...jh.net>,
	Robert Foss <robert.foss@...labora.com>, corbet@....net,
	Andrew Morton <akpm@...ux-foundation.org>,
	Vlastimil Babka <vbabka@...e.cz>,
	Konstantin Khlebnikov <koct9i@...il.com>,
	Hugh Dickins <hughd@...gle.com>,
	Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
	Minchan Kim <minchan@...nel.org>,
	John Stultz <john.stultz@...aro.org>,
	ross.zwisler@...ux.intel.com, jmarchan@...hat.com,
	Johannes Weiner <hannes@...xchg.org>,
	Kees Cook <keescook@...omium.org>,
	Al Viro <viro@...iv.linux.org.uk>,
	Cyrill Gorcunov <gorcunov@...nvz.org>,
	Robin Humble <plaguedbypenguins@...il.com>,
	David Rientjes <rientjes@...gle.com>,
	eric.engestrom@...tec.com, Janis Danisevskis <jdanis@...gle.com>,
	calvinowens@...com, Alexey Dobriyan <adobriyan@...il.com>,
	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	ldufour@...ux.vnet.ibm.com, linux-doc@...r.kernel.org,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Ben Zhang <benzh@...omium.org>,
	Bryan Freed <bfreed@...omium.org>,
	Filipe Brandenburger <filbranden@...omium.org>,
	Mateusz Guzik <mguzik@...hat.com>
Subject: Re: [PACTH v2 0/3] Implement /proc/<pid>/totmaps

On Wed 17-08-16 11:57:56, Sonny Rao wrote:
> On Wed, Aug 17, 2016 at 6:03 AM, Michal Hocko <mhocko@...nel.org> wrote:
> > On Wed 17-08-16 11:31:25, Jann Horn wrote:
[...]
> >> That's at least 30.43% + 9.12% + 7.66% = 47.21% of the task's kernel
> >> time spent on evaluating format strings. The new interface
> >> wouldn't have to spend that much time on format strings because there
> >> isn't so much text to format.
> >
> > well, this is true of course but I would much rather try to reduce the
> > overhead of smaps file than add a new file. The following should help
> > already. I've measured ~7% systime cut down. I guess there is still some
> > room for improvements but I have to say I'm far from being convinced about
> > a new proc file just because we suck at dumping information to the
> > userspace.
> > If this was something like /proc/<pid>/stat which is
> > essentially read all the time then it would be a different question but
> > is the rss, pss going to be all that often? If yes why?
> 
> If the question is why do we need to read RSS, PSS, Private_*, Swap
> and the other fields so often?
> 
> I have two use cases so far involving monitoring per-process memory
> usage, and we usually need to read stats for about 25 processes.
> 
> Here's a timing example on an fairly recent ARM system 4 core RK3288
> running at 1.8Ghz
> 
> localhost ~ # time cat /proc/25946/smaps > /dev/null
> 
> real    0m0.036s
> user    0m0.020s
> sys     0m0.020s
> 
> localhost ~ # time cat /proc/25946/totmaps > /dev/null
> 
> real    0m0.027s
> user    0m0.010s
> sys     0m0.010s
> localhost ~ #
> 
> I'll ignore the user time for now, and we see about 20 ms of system
> time with smaps and 10 ms with totmaps, with 20 similar processes it
> would be 400 milliseconds of cpu time for the kernel to get this
> information from smaps vs 200 milliseconds with totmaps.  Even totmaps
> is still pretty slow, but much better than smaps.
> 
> Use cases:
> 1) Basic task monitoring -- like "top" that shows memory consumption
> including PSS, Private, Swap
>     1 second update means about 40% of one CPU is spent in the kernel
> gathering the data with smaps

I would argue that even 20% is way too much for such a monitoring. What
is the value to do it so often tha 20 vs 40ms really matters?

> 2) User space OOM handling -- we'd rather do a more graceful shutdown
> than let the kernel's OOM killer activate and need to gather this
> information and we'd like to be able to get this information to make
> the decision much faster than 400ms

Global OOM handling in userspace is really dubious if you ask me. I
understand you want something better than SIGKILL and in fact this is
already possible with memory cgroup controller (btw. memcg will give
you a cheap access to rss, amount of shared, swapped out memory as
well). Anyway if you are getting close to the OOM your system will most
probably be really busy and chances are that also reading your new file
will take much more time. I am also not quite sure how is pss useful for
oom decisions.

Don't take me wrong, /proc/<pid>/totmaps might be suitable for your
specific usecase but so far I haven't heard any sound argument for it to
be generally usable. It is true that smaps is unnecessarily costly but
at least I can see some room for improvements. A simple patch I've
posted cut the formatting overhead by 7%. Maybe we can do more.
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ