lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 25 Jul 2007 15:19:59 +1000
From:	Nick Piggin <nickpiggin@...oo.com.au>
To:	Eric St-Laurent <ericstl34@...patico.ca>
CC:	Rusty Russell <rusty@...tcorp.com.au>,
	Fengguang Wu <fengguang.wu@...il.com>,
	Dave Jones <davej@...hat.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	riel <riel@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Tim Pepper <lnxninja@...ibm.com>,
	Chris Snook <csnook@...hat.com>
Subject: Re: [PATCH 0/3] readahead drop behind and size adjustment

Eric St-Laurent wrote:
> On Mon, 2007-23-07 at 19:00 +1000, Nick Piggin wrote:
> 
> 
>>I don't like this kind of conditional information going from something
>>like readahead into page reclaim. Unless it is for readahead _specific_
>>data such as "I got these all wrong, so you can reclaim them" (which
>>this isn't).
>>
>>But I don't like it as a use-once thing. The VM should be able to get
>>that right.
>>
> 
> 
> 
> Question: How work the use-once code in the current kernel? Is there
> any? I doesn't quite work for me...

What *I* think is supposed to happen is that newly read in pages get
put on the inactive list, and unless they get accessed againbefore
being reclaimed, they are allowed to fall off the end of the list
without disturbing active data too much.

I think there is a missing piece here, that we used to ease the reclaim
pressure off the active list when the inactive list grows relatively
much larger than it (which could indicate a lot of use-once pages in
the system).

Andrew got rid of that logic for some reason which I don't know, but I
can't see that use-once would be terribly effective today (so your
results don't surprise me too much).

I think I've been banned from touching vmscan.c, but if you're keen to
try a patch, I might be convinced to come out of retirement :)


> See my previous email today, I've done a small test case to demonstrate 
> the problem and the effectiveness of Peter's patch.  The only piece
> missing is the copy case (read once + write once).
> 
> Regardless of how it's implemented, I think a similar mechanism must be
> added. This is a long standing issue.
> 
> In the end, I think it's a pagecache resources allocation problem. the
> VM lacks fair-share limits between processes. The kernel doesn't have
> enough information to make the right decisions.
> 
> You can refine or use more advanced page reclaim, but some fair-share
> splitting (like the CPU scheduler) between the processes must be
> present.  Of course some process should have large or unlimited VM
> limits, like databases.
> 
> Maybe the "containers" patchset and memory controller can help.  With
> some specific configuration and/or a userspace daemon to adjust the
> limits on the fly.
> 
> Independently, the basic large file streaming read (or copy) once cases
> should not trash the pagecache. Can we agree on that?

One man's trash is another's treasure: some people will want the
files to remain in cache because they'll use them again (copy it
somewhere else, or start editing it after being copied or whatever).

But yeah, we can probably do better at the sequential read/write
case.


> I say, let's add some code to fix the problem.  If we hear about any
> regression in some workloads, we can add a tunable to limit or disable
> its effects, _if_ a better compromised solution cannot be found.

Sure, but let's figure out the workloads and look at all the
alternatives first.

-- 
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ