lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5072F497.2000703@intel.com>
Date:	Mon, 08 Oct 2012 08:43:19 -0700
From:	Alexander Duyck <alexander.h.duyck@...el.com>
To:	Andi Kleen <andi@...stfloor.org>
CC:	konrad.wilk@...cle.com, tglx@...utronix.de, mingo@...hat.com,
	hpa@...or.com, rob@...dley.net, akpm@...ux-foundation.org,
	joerg.roedel@....com, bhelgaas@...gle.com, shuahkhan@...il.com,
	linux-kernel@...r.kernel.org, devel@...uxdriverproject.org,
	x86@...nel.org, torvalds@...ux-foundation.org
Subject: Re: [RFC PATCH 0/7] Improve swiotlb performance by using physical
 addresses

On 10/06/2012 10:57 AM, Andi Kleen wrote:
>> Inlining everything did speed things up a bit, but I still didn't reach
>> the same speed I achieved using the patch set.  However I did notice the
>> resulting swiotlb code was considerably larger.
> Thanks. So your patch makes sense, but imho should pursue the inlining
> in parallel for other call sites.

I'll try to take a look at getting that done this morning.

>> assembly, is replaced with 8 lines of assembly and becomes inline.  In
>> addition we drop the number of calls to __phys_addr from 9 to 2 by
>> dropping them all from swiotlb.  By my math I am probably saving about
>> 120 instructions per packet.  I suspect all of that would probably be
>> cutting the number of instructions per packet enough to probably account
>> for a 5% difference when you consider I am running at about 1.5Mpps per
>> core on a 2.7Ghz processor.
> Maybe it's just me, but that's somehow sad for one if() and a subtraction

Well there is also all of the setup of the call on the function stack. 
By my count just the portion that is used in the standard case is about
9 lines of assembly.  By inlining it and dropping the if case we can
probably drop it to 1.

> BTW __pa used to be a simple subtraction, the if () was just added to
> handle the few call sites for x86-64 that do __pa(&text_symbol).
> Maybe we should just go back to the old __pa_symbol() for those cases,
> then __pa could be the simple subtraction it used to was again
> and it could be inlined and everyone would be happy.
>
> -Andi

What I am probably looking at doing is splitting the function in two as
you suggest where we have a separate function for the text symbol case. 
I will probably also take the 32 bit approach and add a debug version
that is still a separate function for uses such as determining if we
have any callers who should be using __pa_symbol instead of __pa.

Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ