linux-kernel - Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20080811080449.GB17452@csn.ul.ie>
Date:	Mon, 11 Aug 2008 09:04:49 +0100
From:	Mel Gorman <mel@....ul.ie>
To:	Dave Hansen <dave@...ux.vnet.ibm.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>, ebmunson@...ibm.com,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	linuxppc-dev@...abs.org, libhugetlbfs-devel@...ts.sourceforge.net,
	abh@...y.com
Subject: Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

On (07/08/08 10:29), Dave Hansen didst pronounce:
> On Thu, 2008-08-07 at 17:06 +0100, Mel Gorman wrote:
> > On (06/08/08 12:50), Dave Hansen didst pronounce:
> > > The main thing this set of patches does that I care about is take an
> > > anonymous VMA and replace it with a hugetlb VMA.  It does this on a
> > > special cue, but does it nonetheless.
> > 
> > This is not actually a new thing. For a long time now, it has been possible to
> > back  malloc() with hugepages at a userspace level using the morecore glibc
> > hook. That is replacing anonymous memory with a file-backed VMA. It happens
> > in a different place but it's just as deliberate as backing stack and the
> > end result is very similar. As the file is ram-based, it doesn't have the
> > same types of consequences like dirty page syncing that you'd ordinarily
> > watch for when moving from anonymous to file-backed memory.
> 
> Yes, it has already been done in userspace.  That's fine.  It isn't
> adding any complexity to the kernel.  This is adding behavior that the
> kernel has to support as well as complexity.
> 

The complexity is minimal and the progression logical.
hugetlb_file_setup() is the API shmem was using to create a file on an
internal mount suitable for MAP_SHARED. This patchset adds support for
MAP_PRIVATE and the additional complexity is a lot less than supporting
direct pagetable inserts.

> > > This patch has crossed a line in that it is really the first
> > > *replacement* of a normal VMA with a hugetlb VMA instead of the creation
> > > of the VMAs at the user's request. 
> > 
> > We crossed that line with morecore, it's back there somewhere. We're just
> > doing in kernel this time because backing stacks with hugepages in userspace
> > turned out to be a hairy endevour.
> > 
> > Properly supporting anonymous hugepages would either require larger
> > changes to the core or reimplementing yet more of mm/ in mm/hugetlb.c.
> > Neither is a particularly appealing approach, nor is it likely to be a
> > very popular one.
> 
> I agree.  It is always much harder to write code that can work
> generically??? (and get it accepted) than just write the smallest possible
> hack and stick it in fs/exec.c.
> 
> Could this patch at least get fixed up to look like it could be used
> more generically?  Some code to look up and replace anonymous VMAs with
> hugetlb-backed ones???

Ok, this latter point can be looked into at least although the
underlying principal may still be using hugetlb_file_setup() rather than
direct pagetable insertions.

> > > Because of the limitations like its inability to grow the VMA, I can't
> > > imagine that this would be a generic mechanism that we can use
> > > elsewhere.
> > 
> > What other than a stack even cares about VM_GROWSDOWN working? Besides,
> > VM_GROWSDOWN could be supported in a hugetlbfs file by mapping the end of
> > the file and moving the offset backwards (yeah ok, it ain't the prettiest
> > but it's less churn than introducing significantly different codepaths). It's
> > just not something that needs to be supported at first cut.
> > 
> > brk() if you wanted to back hugepages with it conceivably needs a resizing
> > VMA but in that case it's growing up in which case just extend the end of
> > the VMA and increase the size of the file.
> 
> I'm more worried about a small huge page size (say 64k) and not being
> able to merge the VMAs.  I guess it could start in the *middle* of a
> file and map both directions.
> 
> I guess you could always just have a single (very sparse) hugetlb file
> per mm to do all of this 'anonymous' hugetlb memory memory stuff, and
> just map its offsets 1:1 on to the process's virtual address space.
> That would make sure you could always merge VMAs, no matter how they
> grew together.
> 

That's an interesting idea. It isn't as straight-forward as it sounds
due to reservation tracking but at the face of it, I can't see why it
couldn't be made work.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/