linux-kernel - Re: pud_bad vs pud

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20090206005022.GA6803@elte.hu>
Date:	Fri, 6 Feb 2009 01:50:22 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Jeremy Fitzhardinge <jeremy@...p.org>
Cc:	Hugh Dickins <hugh@...itas.com>,
	William Lee Irwin III <wli@...ementarian.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Linux Memory Management List <linux-mm@...ck.org>
Subject: Re: pud_bad vs pud_bad

* Jeremy Fitzhardinge <jeremy@...p.org> wrote:

> Ingo Molnar wrote:
>> just the act of using PAE was measured to cause multi-percent slowdown 
>> in fork() and exec() latencies, etc. The pagetables are twice as large 
>> so is that really surprising?
>>   
>
> Is there a similar slowdown running the CPU in 32 vs 64 bit mode?  Or does 
> having more/wider registers mitigate it?

Yes, of course there's a slowdown on 64-bit kernels in fork() performance, 
mainly related to pte size.

Here's some numbers done with perfstat. The "fork" binary forks 256 times, 
waits for the child tasks and then exits. It is a 32-bit binary, statically 
linked - i.e. very similar layout and function on both 32-bit and 64-bit 
kernels.

The results (tabulated a bit, average result of 20 runs):

 $ perfstat -e -3,-4,-5 ./fork

  Performance counter stats for './fork':

        32-bit  32-bit-PAE     64-bit
     ---------  ----------  ---------
     27.367537   30.660090  31.542003  task clock ticks     (msecs)

          5785        5810       5751  pagefaults           (events)
           389         388        388  context switches     (events)
             4           4          4  CPU migrations       (events)
     ---------  ----------  ---------
                    +12.0%     +15.2%  overhead

So PAE is 12.0% slower (the overhead of double the pte size and three page 
table levels), and 64-bit is 15.2% slower (the extra overhead of having four 
page table levels added to the overhead of double the pte size).

Larger ptes do not come for free and the 64-bit instructions do not mitigate 
the cachemiss overhead and memory bandwidth cost.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/