lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100221022831.GB6448@localhost>
Date:	Sun, 21 Feb 2010 10:28:31 +0800
From:	Wu Fengguang <fengguang.wu@...el.com>
To:	Balbir Singh <balbir@...ux.vnet.ibm.com>
Cc:	Ingo Molnar <mingo@...e.hu>, Chris Frost <frost@...ucla.edu>,
	Steven Rostedt <rostedt@...dmis.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Keiichi KII <k-keiichi@...jp.nec.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Jason Baron <jbaron@...hat.com>,
	Hitoshi Mitake <mitake@....info.waseda.ac.jp>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"lwoodman@...hat.com" <lwoodman@...hat.com>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	Tom Zanussi <tzanussi@...il.com>,
	"riel@...hat.com" <riel@...hat.com>,
	Munehiro Ikeda <m-ikeda@...jp.nec.com>,
	Atsushi Tsuji <a-tsuji@...jp.nec.com>
Subject: Re: [RFC PATCH -tip 0/2 v3] pagecache tracepoints proposal

Hi Balbir,

> > tracing: pagecache object collections
> >
> > This dumps
> > - all cached files of a mounted fs  (the inode-cache)
> > - all cached pages of a cached file (the page-cache)
> >
> > Usage and Sample output:
> >
> > # echo /dev > /debug/tracing/objects/mm/pages/walk-fs
> > # tail /debug/tracing/trace
> >              zsh-2528  [000] 10429.172470: dump_inode: ino=889 size=0 cached=0 age=442 dirty=0 dev=0:18 file=/dev/console
> >              zsh-2528  [000] 10429.172472: dump_inode: ino=888 size=0 cached=0 age=442 dirty=7 dev=0:18 file=/dev/null
> >              zsh-2528  [000] 10429.172474: dump_inode: ino=887 size=40 cached=0 age=442 dirty=0 dev=0:18 file=/dev/shm
> >              zsh-2528  [000] 10429.172477: dump_inode: ino=886 size=40 cached=0 age=442 dirty=0 dev=0:18 file=/dev/pts
> >              zsh-2528  [000] 10429.172479: dump_inode: ino=885 size=11 cached=0 age=442 dirty=0 dev=0:18 file=/dev/core
> >              zsh-2528  [000] 10429.172481: dump_inode: ino=884 size=15 cached=0 age=442 dirty=0 dev=0:18 file=/dev/stderr
> >              zsh-2528  [000] 10429.172483: dump_inode: ino=883 size=15 cached=0 age=442 dirty=0 dev=0:18 file=/dev/stdout
> >              zsh-2528  [000] 10429.172486: dump_inode: ino=882 size=15 cached=0 age=442 dirty=0 dev=0:18 file=/dev/stdin
> >              zsh-2528  [000] 10429.172488: dump_inode: ino=881 size=13 cached=0 age=442 dirty=0 dev=0:18 file=/dev/fd
> >              zsh-2528  [000] 10429.172491: dump_inode: ino=872 size=13360 cached=0 age=442 dirty=0 dev=0:18 file=/dev
> >
> > Here "age" is either age from inode create time, or from last dirty time.
> >
> 
> It would be nice to see mapped/unmapped information as well.

As you noticed, we have mapcount for individual pages :)

> > +static int pages_similiar(struct page* page0, struct page* page)
> > +{
> > +     if (page_count(page0) != page_count(page))
> > +             return 0;
> > +
> > +     if (page_mapcount(page0) != page_mapcount(page))
> > +             return 0;
> > +
> > +     if (page_flags(page0) != page_flags(page))
> > +             return 0;
> > +
> > +     return 1;
> > +}
> > +
> 
> OK, so pages_similar() is used to identify a range of pages in the
> cache?

Right. Many files are accessed sequentially or clustered, so
pages_similar() can save lots of output lines :)

> > +#define BATCH_LINES  100
> > +static void dump_pagecache(struct address_space *mapping)
> > +{
> > +     int i;
> > +     int lines = 0;
> > +     pgoff_t len = 0;
> > +     struct pagevec pvec;
> > +     struct page *page;
> > +     struct page *page0 = NULL;
> > +     unsigned long start = 0;
> > +
> > +     for (;;) {
> > +             pagevec_init(&pvec, 0);
> > +             pvec.nr = radix_tree_gang_lookup(&mapping->page_tree,
> > +                             (void **)pvec.pages, start + len, PAGEVEC_SIZE);
> 
> Is radix_tree_gang_lookup synchronized somewhere? Don't we need to
> call it under RCU or a lock (mapping) ?

No. This function is inherently non-atomic, and it seems that most in-kernel
users do not bother to take rcu_read_lock(). So lets leave it as is?

> > +static ssize_t
> > +trace_pagecache_write(struct file *filp, const char __user *ubuf, size_t count,
> > +                   loff_t *ppos)
> > +{
> > +     struct file *file = NULL;
> > +     char *name;
> > +     int err = 0;
> > +
> 
> Can't we use the trace_parser here?

Seems not necessary? It's merely one file name, which could contain spaces.

> > +     if (count <= 1)
> > +             return -EINVAL;
> > +     if (count > PATH_MAX + 1)
> > +             return -ENAMETOOLONG;
> > +
> > +     name = kmalloc(count+1, GFP_KERNEL);
> > +     if (!name)
> > +             return -ENOMEM;
> > +
> > +     if (copy_from_user(name, ubuf, count)) {
> > +             err = -EFAULT;
> > +             goto out;
> > +     }
> > +
> > +     /* strip the newline added by `echo` */
> > +     if (name[count-1] != '\n')
> > +             return -EINVAL;
> 
> Doesn't sound correct, what happens if we use echo -n?

It's a bit sad. If we accept both "echo" and "echo -n" with some
smart logic to test for trailing '\n', then it will go wrong for a
'\n'-terminated file name.

Or shall we support only "echo -n"?  I can do with either one.

> > --- linux-mm.orig/fs/inode.c  2010-02-08 23:19:12.000000000 +0800
> > +++ linux-mm/fs/inode.c       2010-02-08 23:19:22.000000000 +0800
> > @@ -149,7 +149,7 @@ struct inode *inode_init_always(struct s
> >       inode->i_bdev = NULL;
> >       inode->i_cdev = NULL;
> >       inode->i_rdev = 0;
> > -     inode->dirtied_when = 0;
> > +     inode->dirtied_when = jiffies;
> >
> 
> Hmmm... Is the inode really dirtied when initialized? I know the
> change is for tracing, but the code when read is confusing.

Huh. Not really dirtied (for that you need to check I_DIRTY), but
dirtied_when is only used in writeback code when I_DIRTY is set.

So I overload dirtied_when in the clean case to indicate the inode
load time. This is a useful trick for fastboot to collect cache
footprint shortly after boot, when most inodes are clean.

It does ask for a comment:

        /*
         * This records inode load time. It will be invalidated once inode is
         * dirtied, or jiffies wraps around. Despite the pitfalls it still
         * provides useful information for some use cases like fastboot.
         */
        inode->dirtied_when = jiffies;


Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ