linux-kernel - Re: [PATCH 0/3] readahead drop behind and size adjustment

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1185338106.7105.44.camel@perkele>
Date:	Wed, 25 Jul 2007 00:35:06 -0400
From:	Eric St-Laurent <ericstl34@...patico.ca>
To:	Nick Piggin <nickpiggin@...oo.com.au>
Cc:	Rusty Russell <rusty@...tcorp.com.au>,
	Fengguang Wu <fengguang.wu@...il.com>,
	Dave Jones <davej@...hat.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	riel <riel@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Tim Pepper <lnxninja@...ibm.com>,
	Chris Snook <csnook@...hat.com>
Subject: Re: [PATCH 0/3] readahead drop behind and size adjustment

On Mon, 2007-23-07 at 19:00 +1000, Nick Piggin wrote:

> I don't like this kind of conditional information going from something
> like readahead into page reclaim. Unless it is for readahead _specific_
> data such as "I got these all wrong, so you can reclaim them" (which
> this isn't).
> 
> But I don't like it as a use-once thing. The VM should be able to get
> that right.
> 

Question: How work the use-once code in the current kernel? Is there
any? I doesn't quite work for me...

See my previous email today, I've done a small test case to demonstrate 
the problem and the effectiveness of Peter's patch.  The only piece
missing is the copy case (read once + write once).

Regardless of how it's implemented, I think a similar mechanism must be
added. This is a long standing issue.

In the end, I think it's a pagecache resources allocation problem. the
VM lacks fair-share limits between processes. The kernel doesn't have
enough information to make the right decisions.

You can refine or use more advanced page reclaim, but some fair-share
splitting (like the CPU scheduler) between the processes must be
present.  Of course some process should have large or unlimited VM
limits, like databases.

Maybe the "containers" patchset and memory controller can help.  With
some specific configuration and/or a userspace daemon to adjust the
limits on the fly.

Independently, the basic large file streaming read (or copy) once cases
should not trash the pagecache. Can we agree on that?

I say, let's add some code to fix the problem.  If we hear about any
regression in some workloads, we can add a tunable to limit or disable
its effects, _if_ a better compromised solution cannot be found.

Surely it's possible to have a acceptable solution.

Best regards,

- Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/