linux-kernel - Re: [PATCH v2 2/2] powerpc32: optimise csum

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:	Mon, 17 Aug 2015 15:05:40 +0200
From:	leroy christophe <christophe.leroy@....fr>
To:	Segher Boessenkool <segher@...nel.crashing.org>,
	Scott Wood <scottwood@...escale.com>
CC:	linuxppc-dev@...ts.ozlabs.org, Paul Mackerras <paulus@...ba.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 2/2] powerpc32: optimise csum_partial() loop



Le 17/08/2015 13:00, leroy christophe a écrit :
>
>
> Le 17/08/2015 12:56, leroy christophe a écrit :
>>
>>
>> Le 07/08/2015 01:25, Segher Boessenkool a écrit :
>>> On Thu, Aug 06, 2015 at 05:45:45PM -0500, Scott Wood wrote:
>>>> If this makes performance non-negligibly worse on other 32-bit 
>>>> chips, and is
>>>> an important improvement on 8xx, then we can use an ifdef since 8xx 
>>>> already
>>>> requires its own kernel build.  I'd prefer to see a benchmark 
>>>> showing that it
>>>> actually does make things worse on those chips, though.
>>> And I'd like to see a benchmark that shows it *does not* hurt 
>>> performance
>>> on most chips, and does improve things on 8xx, and by how much. But it
>>> isn't *me* who has to show that, it is not my patch.
>> Ok, following this discussion I made some additional measurement and 
>> it looks like:
>> * There is almost no change on the 885
>> * There is a non negligeable degradation on the 8323 (19.5 tb ticks 
>> instead of 15.3)
>>
>> Thanks for pointing this out, I think my patch is therefore not good.
>>
> Oops, I was talking about my other past, the one that was to optimise 
> ip_csum_fast.
> I still have to measure csum_partial
>
Now, I have the results for csum_partial(). The measurement is done with 
mftbl() before and after calling the function, with IRQ off to get a 
stable measure. Measurement is done with a transfer of vmlinux file done 
3 times via scp toward the target. We get approximatly 50000 calls to 
csum_partial()

On MPC885:
1/ Without the patchset, mean time spent in csum_partial() is 167 tb ticks.
2/ With the patchset, mean time is 150 tb ticks

On MPC8323:
1/ Without the patchset, mean time is 287 tb ticks
2/ With the patchset, mean time is 256 tb ticks

The improvement is approximatly 10% in both cases

So, unlike my patch on ip_fast_csum(), this one is worth it.

Christophe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/