[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5ee86b3b-81d2-790c-f67b-e250f60272fd@loongson.cn>
Date: Wed, 17 Mar 2021 08:26:23 +0800
From: Tiezhu Yang <yangtiezhu@...ngson.cn>
To: David Laight <David.Laight@...LAB.COM>,
"Maciej W. Rozycki" <macro@...am.me.uk>
Cc: Thomas Bogendoerfer <tsbogend@...ha.franken.de>,
"linux-mips@...r.kernel.org" <linux-mips@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Xuefeng Li <lixuefeng@...ngson.cn>
Subject: Re: [PATCH v2] MIPS: Check __clang__ to avoid performance influence
with GCC in csum_tcpudp_nofold()
On 03/15/2021 08:42 PM, David Laight wrote:
> From: Tiezhu Yang <yangtiezhu@...ngson.cn>
>> Sent: 15 March 2021 12:26
>> On 03/15/2021 04:49 AM, Maciej W. Rozycki wrote:
>>> On Tue, 9 Mar 2021, Tiezhu Yang wrote:
>>>
>>>> diff --git a/arch/mips/include/asm/checksum.h b/arch/mips/include/asm/checksum.h
>>>> index 1e6c135..80eddd4 100644
>>>> --- a/arch/mips/include/asm/checksum.h
>>>> +++ b/arch/mips/include/asm/checksum.h
>>>> @@ -128,9 +128,13 @@ static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
>>>>
>>>> static inline __wsum csum_tcpudp_nofold(__be32 saddr, __be32 daddr,
>>>> __u32 len, __u8 proto,
>>>> - __wsum sum)
>>>> + __wsum sum_in)
>>>> {
>>>> - unsigned long tmp = (__force unsigned long)sum;
>>>> +#ifdef __clang__
>>>> + unsigned long sum = (__force unsigned long)sum_in;
>>>> +#else
>>>> + __wsum sum = sum_in;
>>>> +#endif
>>> This looks much better to me, but I'd keep the variable names unchanged
>>> as `sum_in' isn't used beyond the initial assignment anyway (you'll have
>>> to update the references with asm operands accordingly of course).
>>>
>>> Have you verified that code produced by GCC remains the same with your
>>> change in place as it used to be up to commit 198688edbf77? I can see no
>>> such information in the commit description whether here or in the said
>>> commit.
>>>
>>> Maciej
>> Hi Maciej,
>>
>> Thanks for your reply.
>>
>> gcc --version
>> gcc (Debian 10.2.1-6) 10.2.1 20210110
>>
>> net/ipv4/tcp_ipv4.c
>> tcp_v4_send_reset()
>> csum_tcpudp_nofold()
>>
>> objdump -d vmlinux > vmlinux.dump
>>
>> (1) Before commit 198688edbf77
>> ("MIPS: Fix inline asm input/output type mismatch in checksum.h used
>> with Clang"):
>>
>> ffffffff80aa835c: 00004025 move a4,zero
>> ffffffff80aa8360: 92020012 lbu v0,18(s0)
>> ffffffff80aa8364: de140030 ld s4,48(s0)
>> ffffffff80aa8368: 0064182d daddu v1,v1,a0
>> ffffffff80aa836c: 304200ff andi v0,v0,0xff
>> ffffffff80aa8370: 9c64000c lwu a0,12(v1)
>> ffffffff80aa8374: 9c660010 lwu a2,16(v1)
>> ffffffff80aa8378: afa70038 sw a3,56(sp)
>> ffffffff80aa837c: 24071a00 li a3,6656
>> ffffffff80aa8380: 0086202d daddu a0,a0,a2
>> ffffffff80aa8384: 0087202d daddu a0,a0,a3
>> ffffffff80aa8388: 0088202d daddu a0,a0,a4
>> ffffffff80aa838c: 0004083c dsll32 at,a0,0x0
>> ffffffff80aa8390: 0081202d daddu a0,a0,at
>> ffffffff80aa8394: 0081082b sltu at,a0,at
>> ffffffff80aa8398: 0004203f dsra32 a0,a0,0x0
>> ffffffff80aa839c: 00812021 addu a0,a0,at
>>
>> (2) After commit 198688edbf77
>> ("MIPS: Fix inline asm input/output type mismatch in checksum.h used
>> with Clang"):
>>
>> ffffffff80aa836c: 00004025 move a4,zero
>> ffffffff80aa8370: 92040012 lbu a0,18(s0)
>> ffffffff80aa8374: de140030 ld s4,48(s0)
>> ffffffff80aa8378: 0062182d daddu v1,v1,v0
>> ffffffff80aa837c: 308400ff andi a0,a0,0xff
>> ffffffff80aa8380: 9c62000c lwu v0,12(v1)
>> ffffffff80aa8384: 9c660010 lwu a2,16(v1)
>> ffffffff80aa8388: afa70038 sw a3,56(sp)
>> ffffffff80aa838c: 24071a00 li a3,6656
>> ffffffff80aa8390: 0046102d daddu v0,v0,a2
>> ffffffff80aa8394: 0047102d daddu v0,v0,a3
>> ffffffff80aa8398: 0048102d daddu v0,v0,a4
>> ffffffff80aa839c: 0002083c dsll32 at,v0,0x0
>> ffffffff80aa83a0: 0041102d daddu v0,v0,at
>> ffffffff80aa83a4: 0041082b sltu at,v0,at
>> ffffffff80aa83a8: 0002103f dsra32 v0,v0,0x0
>> ffffffff80aa83ac: 00411021 addu v0,v0,at
>>
>> (3) With this patch:
>>
>> ffffffff80aa835c: 00004025 move a4,zero
>> ffffffff80aa8360: 92020012 lbu v0,18(s0)
>> ffffffff80aa8364: de140030 ld s4,48(s0)
>> ffffffff80aa8368: 0064182d daddu v1,v1,a0
>> ffffffff80aa836c: 304200ff andi v0,v0,0xff
>> ffffffff80aa8370: 9c64000c lwu a0,12(v1)
>> ffffffff80aa8374: 9c660010 lwu a2,16(v1)
>> ffffffff80aa8378: afa70038 sw a3,56(sp)
>> ffffffff80aa837c: 24071a00 li a3,6656
>> ffffffff80aa8380: 0086202d daddu a0,a0,a2
>> ffffffff80aa8384: 0087202d daddu a0,a0,a3
>> ffffffff80aa8388: 0088202d daddu a0,a0,a4
>> ffffffff80aa838c: 0004083c dsll32 at,a0,0x0
>> ffffffff80aa8390: 0081202d daddu a0,a0,at
>> ffffffff80aa8394: 0081082b sltu at,a0,at
>> ffffffff80aa8398: 0004203f dsra32 a0,a0,0x0
>> ffffffff80aa839c: 00812021 addu a0,a0,at
>>
>> (4) With the following changes based on commit 198688edbf77
>> ("MIPS: Fix inline asm input/output type mismatch in checksum.h used
>> with Clang"):
>>
>> diff --git a/arch/mips/include/asm/checksum.h
>> b/arch/mips/include/asm/checksum.h
>> index 1e6c135..e1f80407 100644
>> --- a/arch/mips/include/asm/checksum.h
>> +++ b/arch/mips/include/asm/checksum.h
>> @@ -130,7 +130,11 @@ static inline __wsum csum_tcpudp_nofold(__be32
>> saddr, __be32 daddr,
>> __u32 len, __u8 proto,
>> __wsum sum)
>> {
>> +#ifdef __clang__
>> unsigned long tmp = (__force unsigned long)sum;
>> +#else
>> + __wsum tmp = sum;
>> +#endif
>>
>> __asm__(
>> " .set push # csum_tcpudp_nofold\n"
>>
>> ffffffff80aa835c: 00004025 move a4,zero
>> ffffffff80aa8360: 92020012 lbu v0,18(s0)
>> ffffffff80aa8364: de140030 ld s4,48(s0)
>> ffffffff80aa8368: 0064182d daddu v1,v1,a0
>> ffffffff80aa836c: 304200ff andi v0,v0,0xff
>> ffffffff80aa8370: 9c64000c lwu a0,12(v1)
>> ffffffff80aa8374: 9c660010 lwu a2,16(v1)
>> ffffffff80aa8378: afa70038 sw a3,56(sp)
>> ffffffff80aa837c: 24071a00 li a3,6656
>> ffffffff80aa8380: 0086202d daddu a0,a0,a2
>> ffffffff80aa8384: 0087202d daddu a0,a0,a3
>> ffffffff80aa8388: 0088202d daddu a0,a0,a4
>> ffffffff80aa838c: 0004083c dsll32 at,a0,0x0
>> ffffffff80aa8390: 0081202d daddu a0,a0,at
>> ffffffff80aa8394: 0081082b sltu at,a0,at
>> ffffffff80aa8398: 0004203f dsra32 a0,a0,0x0
>> ffffffff80aa839c: 00812021 addu a0,a0,at
>>
>> The code produced by GCC remains the same between (1), (3) and (4),
>> the last changes looks like better (with less changes based on commit
>> 198688edbf77), so I will send v3 later.
> Aren't those all the same - apart from register selection.
> Not that I grok the mips opcodes.
> But that code has horridness on its side.
>
> The only obvious difference is that something else changes the
> code offset from xxxx5c to xxxx6c.
>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
Hi David,
Yes, it seems no much obvious differences.
Let me wait for other feedback.
Hi Thomas and Maciej,
Is this patch necessary? If no, we can ignore it.
If yes, I will send v3 with the above (4) changes.
Thanks,
Tiezhu
Powered by blists - more mailing lists