linux-kernel - Re: [patch] tracing/mm: add page frame snapshot trace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090509140512.GA22000@elte.hu>
Date:	Sat, 9 May 2009 16:05:12 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Wu Fengguang <fengguang.wu@...el.com>
Cc:	Frédéric Weisbecker <fweisbec@...il.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Li Zefan <lizf@...fujitsu.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Andi Kleen <andi@...stfloor.org>,
	Matt Mackall <mpm@...enic.com>,
	Alexey Dobriyan <adobriyan@...il.com>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: [patch] tracing/mm: add page frame snapshot trace

* Wu Fengguang <fengguang.wu@...el.com> wrote:

> > ( End even for tasks, which are perhaps the hardest to iterate, we
> >   can still do the /proc method of iterating up to the offset by 
> >   counting. It wastes some time for each separate thread as it has 
> >   to count up to its offset, but it still allows the dumping itself
> >   to be parallelised. Or we could dump blocks of the PID hash array. 
> >   That distributes tasks well, and can be iterated very easily with 
> >   low/zero contention. The result will come out unordered in any 
> >   case. )
> 
> For task/file based page walking, the best parallelism unit can be 
> the task/file, instead of page segments inside them.
> 
> And there is the sparse file problem. There will be large holes in 
> the address space of file and process(and even physical memory!).

If we want to iterate in the file offset space then we should use 
the find_get_pages() trick: use the page radix tree and do gang 
lookups in ascending order. Holes will be skipped over in a natural 
way in the tree.

Regarding iterators, i think the best way would be to expose a 
number of 'natural iterators' in the object collection directory. 
The current dump_range could be changed to "pfn_index" (it's really 
a 'physical page number' index and iterator), and we could introduce 
a couple of other indices as well:

    /debug/tracing/objects/mm/pages/pfn_index
    /debug/tracing/objects/mm/pages/filename_index
    /debug/tracing/objects/mm/pages/sb_index
    /debug/tracing/objects/mm/pages/task_index

"filename_index" would take a file name (a string), and would dump 
all pages of that inode - perhaps with an additional index/range 
parameter as well. For example:

    echo "/home/foo/bar.txt 0 1000" > filename_index

Would look up that file and dump any pages in the page cache related 
to that file, in the 0..1000 pages offset range.

( We could support the 'batching' of such requests too, so 
  multi-line strings can be used to request multiple files, via a 
  single system call.

  We could perhaps even support directories and do 
  directory-and-all-child-dentries/inodes recursive lookups. )

Other indices/iterators would work like this:

    echo "/var" > sb_index

Would try to find the superblock associated to /var, and output all 
pages that relate to that superblock. (it would iterate over all 
inodes and look them all up in the pagecache and dump any matches)

Alternatively, we could do a reverse look up for the inode from the 
pfn, and output that name. That would bloat the records a bit, and 
would be more costly as well.

The 'task_index' would output based on a PID, it would find the mm 
of that task and dump all pages associated to that mm. Offset/range 
info would be virtual address page index based.

Are these things close to what you had in mind?

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/