linux-kernel - Re: [00/41] Large Blocksize Support V7 (adds memmap support)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070917093031.GB25706@skynet.ie>
Date:	Mon, 17 Sep 2007 10:30:31 +0100
From:	mel@...net.ie (Mel Gorman)
To:	Goswin von Brederlow <brederlo@...ormatik.uni-tuebingen.de>
Cc:	Andrea Arcangeli <andrea@...e.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Joern Engel <joern@...fs.org>,
	Nick Piggin <nickpiggin@...oo.com.au>,
	Christoph Lameter <clameter@....com>,
	torvalds@...ux-foundation.org, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org, Christoph Hellwig <hch@....de>,
	William Lee Irwin III <wli@...omorphy.com>,
	David Chinner <dgc@....com>,
	Jens Axboe <jens.axboe@...cle.com>,
	Badari Pulavarty <pbadari@...il.com>,
	Maxim Levitsky <maximlevitsky@...il.com>,
	Fengguang Wu <fengguang.wu@...il.com>,
	swin wang <wangswin@...il.com>, totty.lu@...il.com,
	hugh@...itas.com
Subject: Re: [00/41] Large Blocksize Support V7 (adds memmap support)

On (17/09/07 00:48), Goswin von Brederlow didst pronounce:
> mel@...net.ie (Mel Gorman) writes:
> 
> > On (16/09/07 17:08), Andrea Arcangeli didst pronounce:
> >> zooming in I see red pixels all over the squares mized with green
> >> pixels in the same square. This is exactly what happens with the
> >> variable order page cache and that's why it provides zero guarantees
> >> in terms of how much ram is really "free" (free as in "available").
> >> 
> >
> > This picture is not grouping pages by mobility so that is hardly a
> > suprise. This picture is not running grouping pages by mobility. This is
> > what the normal kernel looks like. Look at the videos in
> > http://www.skynet.ie/~mel/anti-frag/2007-02-28 and see how list-based
> > compares to vanilla. These are from February when there was less control
> > over mixing blocks than there is today.
> >
> > In the current version mixing occurs in the lower blocks as much as possible
> > not the upper ones. So there are a number of mixed blocks but the number is
> > kept to a minimum.
> >
> > The number of mixed blocks could have been enforced as 0, but I felt it was
> > better in the general case to fragment rather than regress performance.
> > That may be different for large blocks where you will want to take the
> > enforcement steps.
> 
> I agree that 0 is a bad value. But so is infinity. There should be
> some mixing but not a lot. You say "kept to a minimum". Is that
> actively done or already happens by itself. Hopefully the later which
> would be just splendid.
> 

Happens by itself due to biasing mixing blocks at lower PFNs. The exact
number is unknown. We used to track it a long time ago but not any more.

> >> With config-page-shift mmap works on 4k chunks but it's always backed
> >> by 64k or any other largesize that you choosed at compile time. And if
> 
> But would mapping a random 4K page out of a file then consume 64k?
> That sounds like an awfull lot of internal fragmentation. I hope the
> unaligned bits and pices get put into a slab or something as you
> suggested previously.
> 

This is a possibility but Andrea seems confident he can handle it.

> >> the virtual alignment of mmap matches the physical alignment of the
> >> physical largepage and is >= PAGE_SIZE (software PAGE_SIZE I mean) we
> >> could use the 62nd bit of the pte to use a 64k tlb (if future cpus
> >> will allow that). Nick also suggested to still set all ptes equal to
> >> make life easier for the tlb miss microcode.
> 
> It is too bad that existing amd64 CPUs only allow such large physical
> pages. But it kind of makes sense to cut away a full level or page
> tables for the next bigger size each.
> 

Yep on both counts.

> >> > big you can make it. I don't think my system with 1GB ram would work
> >> > so well with 2MB order 0 pages. But I wasn't refering to that but to
> >> > the picture.
> >> 
> >> Sure! 2M is sure way excessive for a 1G system, 64k most certainly
> >> too, of course unless you're running a db or a multimedia streaming
> >> service, in which case it should be ideal.
> 
> rtorrent, Xemacs/gnus, bash, xterm, zsh, make, gcc, galeon and the
> ocasional mplayer.
> 
> I would mostly be concerned how rtorrents totaly random access of
> mmapped files negatively impacts such a 64k page system.
> 

For what it's worth, the last allocation failure that occured with
grouping pages by mobility was order-1 atomic failures for a wireless
network card when bittorrent was running. You're likely right in that
torrents will be an interesting workload in terms of fragmentation.

-- 
-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/