lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 29 Aug 2015 18:15:15 +0300
From:	Yury <yury.norov@...il.com>
To:	Alexey Klimov <klimov.linux@...il.com>,
	Cassidy Burden <cburden@...eaurora.org>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	linux-arm-msm@...r.kernel.org,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-arm-kernel@...ts.infradead.org,
	"David S. Miller" <davem@...emloft.net>,
	Daniel Borkmann <dborkman@...hat.com>,
	Hannes Frederic Sowa <hannes@...essinduktion.org>,
	Lai Jiangshan <laijs@...fujitsu.com>,
	Mark Salter <msalter@...hat.com>,
	AKASHI Takahiro <takahiro.akashi@...aro.org>,
	Thomas Graf <tgraf@...g.ch>,
	Valentin Rothberg <valentinrothberg@...il.com>,
	Chris Wilson <chris@...is-wilson.co.uk>,
	Rasmus Villemoes <linux@...musvillemoes.dk>, linux@...izon.com
Subject: Re: [PATCH] lib: Make _find_next_bit helper function inline


On 24.08.2015 01:53, Alexey Klimov wrote:
> Hi Cassidy,
>
>
> On Wed, Jul 29, 2015 at 11:40 PM, Cassidy Burden <cburden@...eaurora.org> wrote:
>> I changed the test module to now set the entire array to all 0/1s and
>> only flip a few bits. There appears to be a performance benefit, but
>> it's only 2-3% better (if that). If the main benefit of the original
>> patch was to save space then inlining definitely doesn't seem worth the
>> small gains in real use cases.
>>
>> find_next_zero_bit (us)
>> old      new     inline
>> 14440    17080   17086
>> 4779     5181    5069
>> 10844    12720   12746
>> 9642     11312   11253
>> 3858     3818    3668
>> 10540    12349   12307
>> 12470    14716   14697
>> 5403     6002    5942
>> 2282     1820    1418
>> 13632    16056   15998
>> 11048    13019   13030
>> 6025     6790    6706
>> 13255    15586   15605
>> 3038     2744    2539
>> 10353    12219   12239
>> 10498    12251   12322
>> 14767    17452   17454
>> 12785    15048   15052
>> 1655     1034    691
>> 9924     11611   11558
>>
>> find_next_bit (us)
>> old      new     inline
>> 8535     9936    9667
>> 14666    17372   16880
>> 2315     1799    1355
>> 6578     9092    8806
>> 6548     7558    7274
>> 9448     11213   10821
>> 3467     3497    3449
>> 2719     3079    2911
>> 6115     7989    7796
>> 13582    16113   15643
>> 4643     4946    4766
>> 3406     3728    3536
>> 7118     9045    8805
>> 3174     3011    2701
>> 13300    16780   16252
>> 14285    16848   16330
>> 11583    13669   13207
>> 13063    15455   14989
>> 12661    14955   14500
>> 12068    14166   13790
>>
>> On 7/29/2015 6:30 AM, Alexey Klimov wrote:
>>>
>>> I will re-check on another machine. It's really interesting if
>>> __always_inline makes things better for aarch64 and worse for x86_64. It
>>> will be nice if someone will check it on x86_64 too.
>>
>>
>> Very odd, this may be related to the other compiler optimizations Yuri
>> mentioned?
>
> It's better to ask Yury, i hope he can answer some day.
>
> Do you need to re-check this (with more iterations or on another machine(s))?
>

Hi, Alexey, Cassidy,

(restoring Rasmus, George)

I found no difference between original and inline versions for x86_64:
(Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz)

find_next_bit		find_next_zero_bit
old	new	inline	old	new	inline
24	27	28	22	28	28
24	27	28      23	27	28
24	27	28      23	27	28
           	
Inspecting assembler code, I found that GCC wants to see helper separated,
even if you provide '__always_inline':

inline <find_next_bit_new>:			current <find_next_bit_new>:			
280:	cmp    %rdx,%rsi                        210:	cmp    %rdx,%rsi
283:	jbe    295 <find_next_bit_new+0x15>     213:	jbe    227 <find_next_bit_new+0x17>
285:	test   %rsi,%rsi                        215:	test   %rsi,%rsi
288:	je     295 <find_next_bit_new+0x15>     218:	je     227 <find_next_bit_new+0x17>
28a:	push   %rbp                             21a:	push   %rbp
28b:	mov    %rsp,%rbp                        21b:	xor    %ecx,%ecx
28e:	callq  0 <find_next_bit_new.part.0>     21d:	mov    %rsp,%rbp
293:	pop    %rbp                             220:	callq  0 <_find_next_bit.part.0>
294:	retq                                    225:	pop    %rbp
295:	mov    %rsi,%rax                        226:	retq
298:	retq                                    227:	mov    %rsi,%rax
299:	nopl   0x0(%rax)                        22a:	retq
       				                22b:	nopl   0x0(%rax,%rax,1)

So things are looking like x86_64 gcc (at least 4.9.2 build for Ubuntu)
ignores '__always_inline' hint as well as 'inline'. But in case of
__always_inline compiler does something not really smart: it introduces
<find_next_bit_new.part.0> and <find_next_zero_bit_new.part.1> helpers
and so increases text size from 0x250 to 0x2b9 bytes, but doesn't really
inline to optimize push/pop and call/ret. I don't like inline, as I
already told, but I believe that complete disabling is bad idea.
Maybe someone knows another trick to make inline work?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ