[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <97e07f34-a1d6-de42-7e8f-9c71b88c1653@oracle.com>
Date: Thu, 3 Aug 2017 16:08:26 -0500
From: Babu Moger <babu.moger@...cle.com>
To: David Miller <davem@...emloft.net>
Cc: sparclinux@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC 3/4] arch/sparc: Optimized memcpy, memset,
copy_to_user, copy_from_user for M7
David, Thanks for the comments. I am working on addressing your feedback.
Comments inline below.
On 7/29/2017 4:36 PM, David Miller wrote:
> From: Babu Moger <babu.moger@...cle.com>
> Date: Thu, 27 Jul 2017 15:57:29 -0600
>
>> @@ -600,7 +600,7 @@ niagara_tlb_fixup:
>> be,pt %xcc, niagara4_patch
>> nop
>> cmp %g1, SUN4V_CHIP_SPARC_M7
>> - be,pt %xcc, niagara4_patch
>> + be,pt %xcc, sparc_m7_patch
>> nop
>> cmp %g1, SUN4V_CHIP_SPARC_SN
>> be,pt %xcc, niagara4_patch
> This part will need to be respun now that the M8 patches are in
> as there will be a slight conflict in this hunk.
Actually, these patches have been tested both on M7 and M8. I wanted to
add M8 also. But M8 patches were
not in the kernel yet. Now that these M8 patches(from Allen) are in the
kernel, I can add it now.
Will update it in the second version.
>> + .register %g2,#scratch
>> +
>> + .section ".text"
>> + .global FUNC_NAME
>> + .type FUNC_NAME, #function
>> + .align 16
>> +FUNC_NAME:
>> + srlx %o2, 31, %g2
>> + cmp %g2, 0
>> + tne %xcc, 5
>> + PREAMBLE
>> + mov %o0, %g1 ! save %o0
>> + brz,pn %o2, .Lsmallx
>> +
>> + cmp %o2, 3
>> + ble,pn %icc, .Ltiny_cp
>> + cmp %o2, 19
>> + ble,pn %icc, .Lsmall_cp
>> + or %o0, %o1, %g2
>> + cmp %o2, SMALL_MAX
>> + bl,pn %icc, .Lmedium_cp
>> + nop
> What in world is going on with this indentation?
>
> I can't comprehend how, if anyone actually put their eyes on
> this code and the patch itself, wouldn't notice this.
>
> DO NOT mix all-spaced and TAB+space indentation.
>
> Always, consistently, use as many TABs as you can and
> then when needed add trailing spaces.
Sure. Will address these problems. In general will address all the
format issues. thanks
>
>> +.Lsrc_dst_aligned_on_8:
>> + ! check if we are copying MED_MAX or more bytes
>> + set MED_MAX, %o3
>> + cmp %o2, %o3 ! limit to store buffer size
>> + bgu,pn %ncc, .Llarge_align8_copy
>> + nop
> Again, same problem here.
>
>> +/*
>> + * Handle all cases where src and dest are aligned on word
>> + * boundaries. Use unrolled loops for better performance.
>> + * This option wins over standard large data move when
>> + * source and destination is in cache for.Lmedium
>> + * to short data moves.
>> + */
>> + set MED_WMAX, %o3
>> + cmp %o2, %o3 ! limit to store buffer size
>> + bge,pt %ncc, .Lunalignrejoin ! otherwise rejoin main loop
>> + nop
> More weird indentation.
>
>> +.dbalign:
>> + andcc %o5, 7, %o3 ! is sp1 aligned on a 8 byte bound?
>> + bz,pt %ncc, .blkalign ! already long word aligned
>> + sub %o3, 8, %o3 ! -(bytes till long word aligned)
>> +
>> + add %o2, %o3, %o2 ! update o2 with new count
>> + ! Set -(%o3) bytes till sp1 long word aligned
>> +1: stb %o1, [%o5] ! there is at least 1 byte to set
>> + inccc %o3 ! byte clearing loop
>> + bl,pt %ncc, 1b
>> + inc %o5
> More weird indentation.
>
>> + ! Now sp1 is block aligned
>> +.blkwr:
>> + andn %o2, 63, %o4 ! calculate size of blocks in bytes
>> + brz,pn %o1, .wrzero ! special case if c == 0
>> + and %o2, 63, %o3 ! %o3 = bytes left after blk stores.
>> +
>> + set MIN_LOOP, %g1
>> + cmp %o4, %g1 ! check there are enough bytes to set
>> + blu,pn %ncc, .short_set ! to justify cost of membar
>> + ! must be > pre-cleared lines
>> + nop
> Likewise.
>
>> +
>> + ! initial cache-clearing stores
>> + ! get store pipeline moving
>> + rd %asi, %g3 ! save %asi to be restored later
>> + wr %g0, ASI_STBIMRU_P, %asi
> Likewise.
>
>> +.wrzero_small:
>> + stxa %o1, [%o5]ASI_STBI_P
>> + subcc %o4, 64, %o4
>> + bgu,pt %ncc, .wrzero_small
>> + add %o5, 64, %o5
>> + ba,a .bsi_done
> Likewise.
>
>> +.asi_done:
>> + wr %g3, 0x0, %asi ! restored saved %asi
>> +.bsi_done:
>> + membar #StoreStore ! required by use of Block Store Init
> Likewise.
>
>> + .size M7memset,.-M7memset
> It's usually a lot better to use ENTRY() and ENDPROC() instead of
> expanding these kinds of directives out.
Ok. Sure. Will address it.
>> + .globl m7_patch_copyops
>> + .type m7_patch_copyops,#function
>> +m7_patch_copyops:
> ENTRY()
Sure.
>> + .size m7_patch_copyops,.-m7_patch_copyops
> ENDPROC()
Sure
>> + .globl m7_patch_bzero
>> + .type m7_patch_bzero,#function
>> +m7_patch_bzero:
> Likewise.
Ok
>> + .size m7_patch_bzero,.-m7_patch_bzero
> Likewise.
Ok
>> + .globl m7_patch_pageops
>> + .type m7_patch_pageops,#function
>> +m7_patch_pageops:
> Likewise.
Ok
>> + .size m7_patch_pageops,.-m7_patch_pageops
> Likewise.
ok
Powered by blists - more mailing lists