lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 5 Feb 2016 08:56:51 +0100
From:	Christophe Leroy <christophe.leroy@....fr>
To:	Denis Kirjanov <kda@...ux-powerpc.org>
Cc:	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Paul Mackerras <paulus@...ba.org>,
	Michael Ellerman <mpe@...erman.id.au>,
	Scott Wood <oss@...error.net>, linuxppc-dev@...ts.ozlabs.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 21/23] powerpc: Simplify test in __dma_sync()



Le 05/02/2016 08:52, Denis Kirjanov a écrit :
> On 2/4/16, Christophe Leroy <christophe.leroy@....fr> wrote:
>>
>> Le 04/02/2016 12:37, Denis Kirjanov a écrit :
>>> On 2/4/16, Christophe Leroy <christophe.leroy@....fr> wrote:
>>>> This simplification helps the compiler. We now have only one test
>>>> instead of two, so it reduces the number of branches.
>>>>
>>>> Signed-off-by: Christophe Leroy <christophe.leroy@....fr>
>>>> ---
>>>> v2: new
>>>> v3: no change
>>>> v4: no change
>>>> v5: no change
>>>>
>>>>    arch/powerpc/mm/dma-noncoherent.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/powerpc/mm/dma-noncoherent.c
>>>> b/arch/powerpc/mm/dma-noncoherent.c
>>>> index 169aba4..2dc74e5 100644
>>>> --- a/arch/powerpc/mm/dma-noncoherent.c
>>>> +++ b/arch/powerpc/mm/dma-noncoherent.c
>>>> @@ -327,7 +327,7 @@ void __dma_sync(void *vaddr, size_t size, int
>>>> direction)
>>>>    		 * invalidate only when cache-line aligned otherwise there is
>>>>    		 * the potential for discarding uncommitted data from the cache
>>>>    		 */
>>>> -		if ((start & (L1_CACHE_BYTES - 1)) || (size & (L1_CACHE_BYTES - 1)))
>>>> +		if ((start | end) & (L1_CACHE_BYTES - 1))
>>>>    			flush_dcache_range(start, end);
>>>>    		else
>>>>    			invalidate_dcache_range(start, end);
>>> The previous version of address cache-line aligned check reads perfectly
>>> fine.
>>> What's the benefit of this micro optimization?
>> With this optimisation we avoid one unneccessary test and two associated
>> jumps. Taking into account that __dma_sync() is one of the top ten CPU
>> consummers, I believe it is worth it:
>>
>>
> Yeah, looks better. Did you compile the kernel with default compiler flags?
>
> Thanks!
Yes I did

Christophe

Powered by blists - more mailing lists