linux-kernel - Re: Ubuntu 32-bit, 32-bit PAE, 64-bit Kernel Benchmarks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 31 Dec 2009 10:39:41 -0800 (PST)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Yuhong Bao <yuhongbao_386@...mail.com>
cc:	mingo@...hat.com, linux-kernel@...r.kernel.org
Subject: Re: Ubuntu 32-bit, 32-bit PAE, 64-bit Kernel Benchmarks

On Wed, 30 Dec 2009, Yuhong Bao wrote:
> 
> Given that Linus was once talking about the performance penalties of PAE 
> and HIGHMEM64G, perhaps you'd find these benchmarks done by Phoronix of 
> interest:
>   http://www.phoronix.com/scan.php?page=article&item=ubuntu_32_pae

PAE has no negative impact on user-land loads (aside from a potentially 
really _tiny_ effect from just bigger page tables), and obviously means 
that you actually have more RAM available, so it can be a big win.

The "25% cost" is purely kernel-side work when the kernel needs to 
kmap/kunmap - which it only needs to do when it touches highmem pages 
itself directly. Which is pretty rare - but when it happens a lot, it's 
extremely expensive.

The worst load I've ever seen (which was the 25%+ case) needed btrfs 
and heavy meta-data workloads (ie things like file creates/deletes, or 
uncached lookups), because btrfs puts all its radix trees in highmem pages 
and thus needs to kmap/kunmap them all. So that's one way to see heavy 
kmap/kunmap loads.

(In the meantime, I complained to the btrfs people about the CPU hogging 
behavior, and afaik btrfs has improved since I did my kernel profiles of 
the benchmarks, but I haven't re-done them)

Theres' a potential secondary issue: my test-bed for that btrfs setup was 
a netbook using Intel Atom. The performance profile of an Atom chip is 
pretty different from any of the better out-of-order CPU's.

Extra instructions cost a lot more. For example, out-of-order is 
particularly good at handling "nonsense" instructions that aren't on a 
critical path and aren't important for actual semantics - things like the 
stack frame modifications etc are often almost "free" on out-of-order 
CPU's because they only tend to have trivial dependencies that can be 
worked around with things like the "stack engine" etc. So I seem to 
remember that the "omit stack frame" option was a much bigger deal on Atom 
than on a Core 2 Duo CPU, for example.

So it's entirely possible that the TLB flushing (and eventual misses, of 
course) involved with kmap()/kunmap() is much more expensive on Atom than 
it is on a Core2 system. So it's possible that my 25% cost thing was for 
pretty much a pessimal situation, due to a combination of heavy kernel 
loads (I used "git status" as one of the btrfs/atom benchmarks - pretty 
much _all_ it does is pathname lookups and readdir) with btrfs and atom.

		Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/