[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <56B35537.3050708@c-s.fr>
Date: Thu, 4 Feb 2016 14:42:15 +0100
From: Christophe Leroy <christophe.leroy@....fr>
To: Denis Kirjanov <kda@...ux-powerpc.org>
Cc: Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Paul Mackerras <paulus@...ba.org>,
Michael Ellerman <mpe@...erman.id.au>,
Scott Wood <oss@...error.net>, linuxppc-dev@...ts.ozlabs.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 21/23] powerpc: Simplify test in __dma_sync()
Le 04/02/2016 12:37, Denis Kirjanov a écrit :
> On 2/4/16, Christophe Leroy <christophe.leroy@....fr> wrote:
>> This simplification helps the compiler. We now have only one test
>> instead of two, so it reduces the number of branches.
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@....fr>
>> ---
>> v2: new
>> v3: no change
>> v4: no change
>> v5: no change
>>
>> arch/powerpc/mm/dma-noncoherent.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/mm/dma-noncoherent.c
>> b/arch/powerpc/mm/dma-noncoherent.c
>> index 169aba4..2dc74e5 100644
>> --- a/arch/powerpc/mm/dma-noncoherent.c
>> +++ b/arch/powerpc/mm/dma-noncoherent.c
>> @@ -327,7 +327,7 @@ void __dma_sync(void *vaddr, size_t size, int direction)
>> * invalidate only when cache-line aligned otherwise there is
>> * the potential for discarding uncommitted data from the cache
>> */
>> - if ((start & (L1_CACHE_BYTES - 1)) || (size & (L1_CACHE_BYTES - 1)))
>> + if ((start | end) & (L1_CACHE_BYTES - 1))
>> flush_dcache_range(start, end);
>> else
>> invalidate_dcache_range(start, end);
> The previous version of address cache-line aligned check reads perfectly fine.
> What's the benefit of this micro optimization?
With this optimisation we avoid one unneccessary test and two associated
jumps. Taking into account that __dma_sync() is one of the top ten CPU
consummers, I believe it is worth it:
Without the patch:
c000d894: 70 6a 00 0f andi. r10,r3,15
c000d898: 39 29 00 0f addi r9,r9,15
c000d89c: 54 63 00 36 rlwinm r3,r3,0,0,27
c000d8a0: 7d 23 48 50 subf r9,r3,r9
c000d8a4: 41 82 00 84 beq c000d928 <__dma_sync+0xb8>
[...]
c000d8c0: 7c 00 04 ac sync
c000d8c4: 4e 80 00 20 blr
[...]
c000d928: 70 8a 00 0f andi. r10,r4,15
c000d92c: 40 a2 ff 7c bne c000d8a8 <__dma_sync+0x38>
c000d930: 55 2a e1 3f rlwinm. r10,r9,28,4,31
c000d934: 41 a2 ff 8c beq c000d8c0 <__dma_sync+0x50>
With the patch:
c000d894: 7c 89 1b 78 or r9,r4,r3
c000d898: 71 2a 00 0f andi. r10,r9,15
c000d89c: 54 63 00 36 rlwinm r3,r3,0,0,27
c000d8a0: 38 84 00 0f addi r4,r4,15
c000d8a4: 7c 83 20 50 subf r4,r3,r4
c000d8a8: 41 82 00 84 beq c000d92c <__dma_sync+0xbc>
[...]
c000d8c4: 7c 00 04 ac sync
c000d8c8: 4e 80 00 20 blr
[...]
c000d92c: 54 89 e1 3f rlwinm. r9,r4,28,4,31
c000d930: 41 a2 ff 94 beq c000d8c4 <__dma_sync+0x54>
Christophe
>> --
>> 2.1.0
>>
>> _______________________________________________
>> Linuxppc-dev mailing list
>> Linuxppc-dev@...ts.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev
Powered by blists - more mailing lists