linux-kernel - Re: [PATCH 5/5] proc: export more page flags in /proc/kpageflags

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090428092454.GB21085@elte.hu>
Date:	Tue, 28 Apr 2009 11:24:54 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Wu Fengguang <fengguang.wu@...el.com>
Cc:	Steven Rostedt <rostedt@...dmis.org>,
	Frédéric Weisbecker <fweisbec@...il.com>,
	Larry Woodman <lwoodman@...hat.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Pekka Enberg <penberg@...helsinki.fi>,
	Eduard - Gabriel Munteanu <eduard.munteanu@...ux360.ro>,
	Andrew Morton <akpm@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Andi Kleen <andi@...stfloor.org>,
	Matt Mackall <mpm@...enic.com>,
	Alexey Dobriyan <adobriyan@...il.com>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: [PATCH 5/5] proc: export more page flags in /proc/kpageflags


* Wu Fengguang <fengguang.wu@...el.com> wrote:

> On Tue, Apr 28, 2009 at 08:55:07AM +0200, Ingo Molnar wrote:
> > 
> > * Wu Fengguang <fengguang.wu@...el.com> wrote:
> > 
> > > Export 9 page flags in /proc/kpageflags, and 8 more for kernel developers.
> > > 
> > > 1) for kernel hackers (on CONFIG_DEBUG_KERNEL)
> > >    - all available page flags are exported, and
> > >    - exported as is
> > > 2) for admins and end users
> > >    - only the more `well known' flags are exported:
> > > 	11. KPF_MMAP		(pseudo flag) memory mapped page
> > > 	12. KPF_ANON		(pseudo flag) memory mapped page (anonymous)
> > > 	13. KPF_SWAPCACHE	page is in swap cache
> > > 	14. KPF_SWAPBACKED	page is swap/RAM backed
> > > 	15. KPF_COMPOUND_HEAD	(*)
> > > 	16. KPF_COMPOUND_TAIL	(*)
> > > 	17. KPF_UNEVICTABLE	page is in the unevictable LRU list
> > > 	18. KPF_HWPOISON	hardware detected corruption
> > > 	19. KPF_NOPAGE		(pseudo flag) no page frame at the address
> > > 
> > > 	(*) For compound pages, exporting _both_ head/tail info enables
> > > 	    users to tell where a compound page starts/ends, and its order.
> > > 
> > >    - limit flags to their typical usage scenario, as indicated by KOSAKI:
> > > 	- LRU pages: only export relevant flags
> > > 		- PG_lru
> > > 		- PG_unevictable
> > > 		- PG_active
> > > 		- PG_referenced
> > > 		- page_mapped()
> > > 		- PageAnon()
> > > 		- PG_swapcache
> > > 		- PG_swapbacked
> > > 		- PG_reclaim
> > > 	- no-IO pages: mask out irrelevant flags
> > > 		- PG_dirty
> > > 		- PG_uptodate
> > > 		- PG_writeback
> > > 	- SLAB pages: mask out overloaded flags:
> > > 		- PG_error
> > > 		- PG_active
> > > 		- PG_private
> > > 	- PG_reclaim: mask out the overloaded PG_readahead
> > > 	- compound flags: only export huge/gigantic pages
> > > 
> > > Here are the admin/linus views of all page flags on a newly booted nfs-root system:
> > > 
> > > # ./page-types # for admin
> > >          flags  page-count       MB  symbolic-flags                     long-symbolic-flags
> > > 0x000000000000      491174     1918  ____________________________                
> > > 0x000000000020           1        0  _____l______________________       lru      
> > > 0x000000000028        2543        9  ___U_l______________________       uptodate,lru
> > > 0x00000000002c        5288       20  __RU_l______________________       referenced,uptodate,lru
> > > 0x000000004060           1        0  _____lA_______b_____________       lru,active,swapbacked
> > 
> > I think i have to NAK this kind of ad-hoc instrumentation of kernel 
> > internals and statistics until we clear up why such instrumentation 
> > measures are being accepted into the MM while other, more dynamic 
> > and more flexible MM instrumentation are being resisted by Andrew.
> 
> An unexpected NAK - to throw away an orange because we are to have an apple? ;-)
> 
> Anyway here are the missing rationals.
> 
> 1) FAST
> 
> It takes merely 0.2s to scan 4GB pages:
> 
>         ./page-types  0.02s user 0.20s system 99% cpu 0.216 total
> 
> 2) SIMPLE
> 
> /proc/kpageflags will be a *long standing* hack we have to live 
> with - it was originally introduced by Matt to do shared memory 
> accounting and a facility to analyze applications' memory 
> consumptions, with the hope it will also help kernel developers 
> someday.
> 
> So why not extend and embrace it, in a straightforward way?
> 
> 3) USE CASES
> 
> I have/will take advantage of the above page-types command in a number ways:
> - to help track down memory leak (the recent trace/ring_buffer.c case)
> - to estimate the system wide readahead miss ratio
> - Andi want to examine the major page types in different workloads
>   (for the hwpoison work)
> - Me too, for fun of learning: read/write/lock/whatever a lot of pages
>   and examine their flags, to get an idea of some random kernel behaviors.
>   (the dynamic tracing tools can be more helpful, as a different view)
> 
> 4) COMPLEMENTARITY
> 
> In some cases the dynamic tracing tool is not enough (or too complex)
> to rebuild the current status view.
> 
> I myself have a dynamic readahead tracing tool(very useful!). At 
> the same time I also use readahead accounting numbers, and the 
> /proc/filecache tool(frequently!), and the above page-types tool. 
> I simply need them all - they are handy for different cases.

Well, the main counter argument here is that statistics is _derived_ 
from events. In their simplest form the 'counts' are the integral of 
events over time.

So if we capture all interesting events, and do that with low 
overhead (and in fact can even collect and integrate them in-kernel, 
today), we _dont have_ to maintain various overlapping counters all 
around the kernel. This is really a general instrumentation design 
observation.

Every time we add yet another /proc hack we splinter Linux 
instrumentation, in a hard to reverse way.

So your single-purpose /proc hack could be made multi-purpose and 
could help a much broader range of people, with just a little bit of 
effort i believe. Pekka already wrote the page tracking patch for 
example, that would be a good starting point.

Does it mean more work to do? You bet ;-)

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/