linux-kernel - Re: madvise(2) MADV

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 17 Jul 2008 16:14:29 +1000
From:	Nick Piggin <nickpiggin@...oo.com.au>
To:	Eric Rannaud <eric.rannaud@...il.com>
Cc:	Chris Snook <csnook@...hat.com>, Rik van Riel <riel@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	linux-kernel@...r.kernel.org, linux-mm <linux-mm@...ck.org>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: madvise(2) MADV_SEQUENTIAL behavior

On Thursday 17 July 2008 10:01, Eric Rannaud wrote:
> On Wed, 2008-07-16 at 17:05 -0400, Chris Snook wrote:
> > Rik van Riel wrote:
> > > I believe that for mmap MADV_SEQUENTIAL, we will have to do
> > > an unmap-behind from the fault path.  Not every time, but
> > > maybe once per megabyte, unmapping the megabyte behind us.
> >
> > Wouldn't it just be easier to not move pages to the active list when
> > they're referenced via an MADV_SEQUENTIAL mapping?  If we keep them on
> > the inactive list, they'll be candidates for reclaiming, but they'll
> > still be in pagecache when another task scans through, as long as we're
> > not under memory pressure.
>
> This approach, instead of invalidating the pages right away would
> provide a middle ground: a way to tell the kernel "these pages are not
> too important".
>
> Whereas if MADV_SEQUENTIAL just invalidates the pages once per megabyte
> (say), then it's only doing what is already possible using MADV_DONTNEED
> ("drop this pages now"). It would automate the process, but it would not
> provide a more subtle hint, which could be quite useful.
>
> As I see it, there are two basic concepts here:
> - no_reuse (like FADV_NOREUSE)
> - more_ra (more readahead)
> (DONTNEED being another different concept)
>
> Then:
> MADV_SEQUENTIAL = more_ra | no_reuse
> FADV_SEQUENTIAL = more_ra | no_reuse
> FADV_NOREUSE = no_reuse
>
> Right now, only the 'more_ra' part is implemented. 'no_reuse' could be
> implemented as Chris suggests.
>
> It looks like the disagreement a year ago around Peter's approach was
> mostly around the question of whether using read ahead as a heuristic
> for "drop behind" was safe for all workloads.
>
> Would it be less controversial to remove the heuristic (ra->size ==
> ra->ra_pages), and to do something only if the user asked for
> _SEQUENTIAL or _NOREUSE?

It's far far easier to tell the kernel "I am no longer using these
pages" than to say "I will not use these pages sometime in the future
after I have used them". The former can be done synchronously and with
a much higher efficiency than it takes to scan through LRU lists to
figure this out.

We should be using the SEQUENTIAL to open up readahead windows, and ask
userspace applications to use DONTNEED to drop if it is important. IMO.


> It might encourage user space applications to start using
> FADV_SEQUENTIAL or FADV_NOREUSE more often (as it would become
> worthwhile to do so), and if they do (especially cron jobs), the problem
> of the slow desktop in the morning would progressively solve itself.

The slow desktop in the morning should not happen even without such a
call, because the kernel should not throw out frequently used data (even
if it is not quite so recent) in favour of streaming data.

OK, I figure it doesn't do such a good job now, which is sad, but making
all apps micromanage the pagecache to get reasonable performance on a
2GB+ desktop system is even more sad ;)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/