linux-kernel - Re: [RFC PATCH -tip 0/5] perf tools: pagecache monitoring

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 29 Jul 2011 10:14:13 +0100
From:	Mel Gorman <mgorman@...e.de>
To:	Keiichi KII <k-keiichi@...jp.nec.com>
Cc:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...e.hu>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	Tom Zanussi <tzanussi@...il.com>,
	"riel@...hat.com" <riel@...hat.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Fr??d??ric Weisbecker <fweisbec@...il.com>,
	Wu Fengguang <fengguang.wu@...el.com>,
	"BA, Moussa" <Moussa.BA@...onyx.com>
Subject: Re: [RFC PATCH -tip 0/5] perf tools: pagecache monitoring

On Mon, Jul 18, 2011 at 05:31:09PM -0400, Keiichi KII wrote:
> Hello,
> 
> I would propose pagecache monitoring tools using perf tools.
> The purpose of the tools is to clarify pagecache behavior in a system.
> 
> We can now know system-wide pagecache usage by "/proc/meminfo".
> But we don't have any way to get higher resolution information like
> per file or per process usage than system-wide one.

It's convulated and slow but it can be determined what processes are
mapping pages from a file by scanning /proc/PID/maps to identify what
files are mapped and then using /proc/PID/pageflags to see what pages
are mapped on a per-process basis.

> If pagecache miss hit ratio rises due to unnecessary adding/removing
> pagecaches, maybe it leads to extra I/O and affects system performance.
> But it's difficult to find out what is going on in the system.
> 

What is "unnecessary adding/removing pagecaches"? If it's added, it
means either someone has faulted the page or it is due to readahead.
It's necessary to something.

> So, the tools I propose provide 2 functions:
> 
> 1. pagecache snapshooting(perf script pagecache-snapshoot)
> 

snapshot, not snapshoot :)

> This function clarifies pagecache usage per each file in the system.
> This function is based mainly on "pagecache object collections" that is
> developed by Wu Fengguang (http://lkml.org/lkml/2010/2/9/156).
> The following is sample output of this function.
> 
> pagecache snapshooting (time: 14131, path: /home)
>                              file name cache(B)  file(B)  ratio  +/-(B)    age
> -------------------------------------- -------- -------- ------ ------- ------
> /home/foo/git/linux-2.6-tip/.git/objec    71.0M   436.6M    16%       0   9012
> /home/foo/git/linux-2.6-tip/.git/objec    49.6M    57.7M    86%       0   9012
> /home/foo/.thunderbird/xso5zn7g.defaul    19.8M    19.8M   100%       0   7223
> /home/foo/.thunderbird/xso5zn7g.defaul     5.7M     5.7M   100%       0   6621
> /home/foo/git/linux-2.6-tip/.git/index     3.5M     3.5M   100%       0   4306
> /home/foo/.thunderbird/xso5zn7g.defaul     2.2M     2.2M   100%       0   7524
> /home/foo/.thunderbird/xso5zn7g.defaul     2.2M     2.2M   100%       0   7526
> /home/foo/.thunderbird/xso5zn7g.defaul     1.7M     1.7M   100%       0   6921
> ...
> 
> 2. continuous pagecache monitoring(perf script pagecachetop)
> 
> This function clarifies pagecache behavior like pagecache hit ratio and
> added/removed pagecache amount on the basis of file/process.
> This functions is based on pagecache tracepoints I propose.
> While the pagecache snapshooting can take a pagecache snapshoot at a point,
> the continuous pagecache monitoring can measure dynamic change between
> 2 snapshoots.
> The following is sample output of this function.
> 
> pagecache behavior per file (time:15826, interval:10)
> 
>                          find        hit    cache      add   remove  proc
>                 file    count      ratio pages(B) pages(B) pages(B) count
> -------------------- -------- ---------- -------- -------- -------- -----
>         libc-2.13.so      620    100.00%     1.2M        0        0     7
>                 bash      283    100.00%   888.0K        0        0     6
>           ld-2.13.so      136    100.00%   148.0K        0        0     6
>                 gawk      130    100.00%   376.0K        0        0     2
>          ld.so.cache       60    100.00%   116.0K        0        0     4
> ...
> 
> pagecache behavior per process (time:16294, interval:10)
> 
>                          find        hit      add   remove  file
>              process    count      ratio pages(B) pages(B) count
> -------------------- -------- ---------- -------- -------- -----
>             zsh-7761     2968     99.93%     4.0K        0   246
>            perf-7758      369    100.00%        0        0    17
>            xmms-7634       52    100.00%        0        0     1
>            perf-7759       11    100.00%        0        0     2
>             zsh-2815        6     83.33%     4.0K     4.0K     2
>        gconfd-2-4849        3      0.00%    12.0K    12.0K     4
>        rsyslogd-7194        1    100.00%        0        0     1
> 

Ok, I can see how this could identify hot files in the system and
processes that are currently active but I'm not seeing what problem
could be solved with this information.

For example, a process that is suffering a poor hit ratio can usually
be identified with a combination of vmstat showing a large amount
of activity with page in/out or swap in/out and a profile showing
a process that is spending a high percentage of its time servicing
faults. If a profile is unsuitable, picking active processes from
top and then monitoring /proc/PID/stat and the fault counters can
identify which process in trouble.

Can you give an example of a problem that using this tool would
identify and help resolve?

> By these 2 functions, we can find out whether pagecaches are used
> efficiently or not.
> And also these tools would help us tune some applications like database.

If the database is using direct IO, will it even show up in this tool?

> It will also help us tune the kernel parameters like "vm.dirty_*".
> 

Does it though? A dirty_ratio that is too low can cause stalls as
processes get throttled cleaning pages but the pages are still present
in the page cache and the hit ratio would not be affected.

I'm not saying this is a bad idea but the changelog needs to be improved
to explain why this information is helpful and why the existing
mechanisms are either unsuitable or just too awkward to use.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/