lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5c47950a-0102-4555-b056-e85f2abc2011@linux.ibm.com>
Date: Fri, 31 Jan 2025 12:24:31 +0530
From: Shrikanth Hegde <sshegde@...ux.ibm.com>
To: Christophe Leroy <christophe.leroy@...roup.eu>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: mpe@...erman.id.au, maddy@...ux.ibm.com, linuxppc-dev@...ts.ozlabs.org,
        npiggin@...il.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 1/1] powerpc: Enable dynamic preemption



On 1/31/25 11:39, Christophe Leroy wrote:
> 
> 
> Le 30/01/2025 à 21:26, Sebastian Andrzej Siewior a écrit :
>> On 2025-01-30 22:27:07 [+0530], Shrikanth Hegde wrote:
>>>> | #DEFINE need_irq_preemption() \
>>>> |          
>>>> (static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
>>>> |
>>>> |     if (need_irq_preemption()) {
>>>>
>>>> be a bit smaller/ quicker? This could be a fast path ;)
>>>
>>> I am okay with either way. I did try both[1], there wasn't any 
>>> significant difference,
>>> hence chose a simpler one. May be system size, workload pattern might 
>>> matter.
>>>
>>> Let me do some more testing to see which one wins.
>>> Is there any specific benchmark which might help here?
>>
>> No idea. As per bean counting: preempt_model_preemptible() should
>> resolve in two function calls + conditional in the dynamic case. This
>> should be more expensive compared to a nop/ branch ;)
> 
> I did a build comparison on pmac32_defconfig + dynamic preemption, the 
> static branch looks definitely more optimal:
> 
> Without static branch:
> 
> ...
>   294:    7c 08 02 a6     mflr    r0
>   298:    90 01 00 24     stw     r0,36(r1)
>   29c:    48 00 00 01     bl      29c <interrupt_exit_kernel_prepare+0x50>
>              29c: R_PPC_REL24    preempt_model_full
>   2a0:    2c 03 00 00     cmpwi   r3,0
>   2a4:    41 82 00 a4     beq     348 <interrupt_exit_kernel_prepare+0xfc>
> ...
>   2a8:    81 22 00 4c     lwz     r9,76(r2)
>   2ac:    71 29 00 04     andi.   r9,r9,4
>   2b0:    40 82 00 d4     bne     384 <interrupt_exit_kernel_prepare+0x138>
> ...
>   2b4:    7d 20 00 a6     mfmsr   r9
>   2b8:    55 29 07 fa     rlwinm  r9,r9,0,31,29
> ...
>   348:    48 00 00 01     bl      348 <interrupt_exit_kernel_prepare+0xfc>
>              348: R_PPC_REL24    preempt_model_lazy
>   34c:    2c 03 00 00     cmpwi   r3,0
>   350:    40 a2 ff 58     bne     2a8 <interrupt_exit_kernel_prepare+0x5c>
>   354:    4b ff ff 60     b       2b4 <interrupt_exit_kernel_prepare+0x68>
> 
> With the static branch:
> 
> ...
>   294:    48 00 00 58     b       2ec <interrupt_exit_kernel_prepare+0xa0>
> ...
>   298:    7d 20 00 a6     mfmsr   r9
>   29c:    55 29 07 fa     rlwinm  r9,r9,0,31,29
> ...
>   2ec:    81 22 00 4c     lwz     r9,76(r2)
>   2f0:    71 29 00 04     andi.   r9,r9,4
>   2f4:    41 82 ff a4     beq     298 <interrupt_exit_kernel_prepare+0x4c>
> ...
> 
> With the static branch it is just an unconditional branch being later 
> replaced by a NOP. It must be more efficient.
> 
> So in first case we need to get and save LR, call preempt_model_full(), 
> compare with 0, call preempt_model_lazy() and compare again, and at some 
> point read and restore LR.
> 
> And the preempt_model_() functions are not tiny functions:
> 
> 00003620 <preempt_model_full>:
>      3620:    3d 20 00 00     lis     r9,0
>              3622: R_PPC_ADDR16_HA    preempt_dynamic_mode
>      3624:    94 21 ff f0     stwu    r1,-16(r1)
>      3628:    80 69 00 00     lwz     r3,0(r9)
>              362a: R_PPC_ADDR16_LO    preempt_dynamic_mode
>      362c:    2c 03 ff ff     cmpwi   r3,-1
>      3630:    41 82 00 18     beq     3648 <preempt_model_full+0x28>
>      3634:    68 63 00 02     xori    r3,r3,2
>      3638:    38 21 00 10     addi    r1,r1,16
>      363c:    7c 63 00 34     cntlzw  r3,r3
>      3640:    54 63 d9 7e     srwi    r3,r3,5
>      3644:    4e 80 00 20     blr
> ...
>      3648:    0f e0 00 00     twui    r0,0
>      364c:    68 63 00 02     xori    r3,r3,2
>      3650:    38 21 00 10     addi    r1,r1,16
>      3654:    7c 63 00 34     cntlzw  r3,r3
>      3658:    54 63 d9 7e     srwi    r3,r3,5
>      365c:    4e 80 00 20     blr
> 
> 00003660 <preempt_model_lazy>:
>      3660:    3d 20 00 00     lis     r9,0
>              3662: R_PPC_ADDR16_HA    preempt_dynamic_mode
>      3664:    94 21 ff f0     stwu    r1,-16(r1)
>      3668:    80 69 00 00     lwz     r3,0(r9)
>              366a: R_PPC_ADDR16_LO    preempt_dynamic_mode
>      366c:    2c 03 ff ff     cmpwi   r3,-1
>      3670:    41 82 00 18     beq     3688 <preempt_model_lazy+0x28>
>      3674:    68 63 00 03     xori    r3,r3,3
>      3678:    38 21 00 10     addi    r1,r1,16
>      367c:    7c 63 00 34     cntlzw  r3,r3
>      3680:    54 63 d9 7e     srwi    r3,r3,5
>      3684:    4e 80 00 20     blr
> ...
>      3688:    0f e0 00 00     twui    r0,0
>      368c:    68 63 00 03     xori    r3,r3,3
>      3690:    38 21 00 10     addi    r1,r1,16
>      3694:    7c 63 00 34     cntlzw  r3,r3
>      3698:    54 63 d9 7e     srwi    r3,r3,5
>      369c:    4e 80 00 20     blr
> 
> 
> 

That's convincing. Thanks for the pointers. I will move it into static 
branch method.

That makes the code same for arm64/powerpc. Maybe its worth moving it 
into sched/core. Let me try that as well.

PS: on leave next week, so will be sending the patches after i get back.

>> But you would still need preempt_model_preemptible() for the !DYN case.
>>
>>>>> +           preempt_model_voluntary() ? "voluntary" :
>>>>> +           preempt_model_full()      ? "full" :
>>>>> +           preempt_model_lazy()      ? "lazy" :
>>>>> +           "",
>>>>
>>>> So intend to rework this part. I have patches stashed at
>>>>     https://eur01.safelinks.protection.outlook.com/? 
>>>> url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Fbigeasy%2Fstaging.git%2Flog%2F%3Fh%3Dpreemption_string&data=05%7C02%7Cchristophe.leroy%40csgroup.eu%7C553a8f640a9c4514597d08dd416c6534%7C8b87af7d86474dc78df45f69a2011bb5%7C0%7C0%7C638738655988556429%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=ZfVQi1iRYaVCTrZ5vIhOQ7yAKaDnOwFi0NRjycfrI5A%3D&reserved=0
>>>>
>>>> which I didn't sent yet due to the merge window. Just a heads up ;)
>>>
>>> Makes sense. I had seen at-least two places where this code was 
>>> there, ftrace/powerpc.
>>> There were way more places..
>>>
>>> You want me to remove this part?
>>
>> No, just be aware.
>> I don't know how this will be routed I guess we merge the sched pieces
>> first and then I submit the other pieces via the relevant maintainer
>> tree. In that case please be aware that all parts get removed/ replaced
>> properly.
>>
>> Sebastian
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ