linux-kernel - Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <200807311626.15709.nickpiggin@yahoo.com.au>
Date:	Thu, 31 Jul 2008 16:26:15 +1000
From:	Nick Piggin <nickpiggin@...oo.com.au>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Mel Gorman <mel@....ul.ie>, Eric Munson <ebmunson@...ibm.com>,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	linuxppc-dev@...abs.org, libhugetlbfs-devel@...ts.sourceforge.net
Subject: Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks

On Thursday 31 July 2008 16:14, Andrew Morton wrote:
> On Thu, 31 Jul 2008 16:04:14 +1000 Nick Piggin <nickpiggin@...oo.com.au> 
wrote:
> > > Do we expect that this change will be replicated in other
> > > memory-intensive apps?  (I do).
> >
> > Such as what? It would be nice to see some numbers with some HPC or java
> > or DBMS workload using this. Not that I dispute it will help some cases,
> > but 10% (or 20% for ppc) I guess is getting toward the best case, short
> > of a specifically written TLB thrasher.
>
> I didn't realise the STREAM is using vast amounts of automatic memory.
> I'd assumed that it was using sane amounts of stack, but the stack TLB
> slots were getting zapped by all the heap-memory activity.  Oh well.

An easy mistake to make because that's probabably how STREAM would normally
work. I think what Mel had done is to modify the stream kernel so as to
have it operate on arrays of stack memory.

> I guess that effect is still there, but smaller.

I imagine it should be, unless you're using a CPU with seperate TLBs for
small and huge pages, and your large data set is mapped with huge pages,
in which case you might now introduce *new* TLB contention between the
stack and the dataset :)

Also, interestingly I have actually seen some CPUs whos memory operations
get significantly slower when operating on large pages than small (in the
case when there is full TLB coverage for both sizes). This would make
sense if the CPU only implements a fast L1 TLB for small pages.

So for the vast majority of workloads, where stacks are relatively small
(or slowly changing), and relatively hot, I suspect this could easily have
no benefit at best and slowdowns at worst.

But I'm not saying that as a reason not to merge it -- this is no
different from any other hugepage allocations and as usual they have to be
used selectively where they help.... I just wonder exactly where huge
stacks will help.

> I agree that few real-world apps are likely to see gains of this
> order.  More benchmarks, please :)

Would be nice, if just out of morbid curiosity :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/