lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20070728203900.fb75c307.akpm@linux-foundation.org>
Date:	Sat, 28 Jul 2007 20:39:00 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Rik van Riel <riel@...hat.com>
Cc:	Mike Galbraith <efault@....de>, Ingo Molnar <mingo@...e.hu>,
	Frank Kingswood <frank@...gswood-consulting.co.uk>,
	Andi Kleen <andi@...stfloor.org>,
	Nick Piggin <nickpiggin@...oo.com.au>,
	Ray Lee <ray-lk@...rabbit.org>,
	Jesper Juhl <jesper.juhl@...il.com>,
	ck list <ck@....kolivas.org>, Paul Jackson <pj@....com>,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: RFT: updatedb "morning after" problem [was: Re: -mm merge plans
 for 2.6.23]

On Sat, 28 Jul 2007 21:33:59 -0400 Rik van Riel <riel@...hat.com> wrote:

> Andrew Morton wrote:
> 
> > What I think is killing us here is the blockdev pagecache: the pagecache
> > which backs those directory entries and inodes.  These pages get read
> > multiple times because they hold multiple directory entries and multiple
> > inodes.  These multiple touches will put those pages onto the active list
> > so they stick around for a long time and everything else gets evicted.
> > 
> > I've never been very sure about this policy for the metadata pagecache.  We
> > read the filesystem objects into the dcache and icache and then we won't
> > read from that page again for a long time (I expect).  But the page will
> > still hang around for a long time.
> > 
> > It could be that we should leave those pages inactive.
> 
> Good idea for updatedb.
> 
> However, it may be a bad idea for files that are often
> written to.  Turning an inode write into a read plus a
> write does not sound like such a hot idea, we really
> want to keep those in the cache.

Remember that this problem applies to both inode blocks and to directory
blocks.  Yes, it might be useful to hold onto an inode block for a future
write (atime, mtime, usually), but not a directory block.

> I think what you need is to ignore multiple references
> to the same page when they all happen in one time
> interval, counting them only if they happen in multiple
> time intervals.

Yes, the sudden burst of accesses for adjacent inode/dirents will be a
common pattern, and it'd make heaps of sense to treat that as a single
touch.  It'd have to be done in the fs I guess, and it might be a bit hard
to do.  And it turns out that embedding the touch_buffer() all the way down
in __find_get_block() was convenient, but it's going to be tricky to
change.

For now I'm fairly inclined to just nuke the touch_buffer() on the read side
and maybe add one on the modification codepaths and see what happens.

As always, testing is the problem.

> The use-once cleanup (which takes a page flag for PG_new,
> I know...) would solve that problem.
> 
> However, it would introduce the problem of having to scan
> all the pages on the list before a page becomes freeable.
> We would have to add some background scanning (or a separate
> list for PG_new pages) to make the initial pageout run use
> an acceptable amount of CPU time.
> 
> Not sure that complexity will be worth it...
> 

I suspect that the situation we have now is so bad that pretty much
anything we do will be an improvement.  I've always wondered "ytf is there
so much blockdev pagecache?"

This machine I'm typing at:

MemTotal:      3975080 kB
MemFree:        750400 kB
Buffers:        547736 kB
Cached:        1299532 kB
SwapCached:      12772 kB
Active:        1789864 kB
Inactive:       861420 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:      3975080 kB
LowFree:        750400 kB
SwapTotal:     4875716 kB
SwapFree:      4715660 kB
Dirty:              76 kB
Writeback:           0 kB
Mapped:         638036 kB
Slab:           522724 kB
CommitLimit:   6863256 kB
Committed_AS:  1115632 kB
PageTables:      14452 kB
VmallocTotal: 34359738367 kB
VmallocUsed:     36432 kB
VmallocChunk: 34359696379 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

More that a quarter of my RAM in fs metadata!  Most of it I'll bet is on the
active list.  And the fs on which I do most of the work is mounted
noatime..


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ