[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5c47950a-0102-4555-b056-e85f2abc2011@linux.ibm.com>
Date: Fri, 31 Jan 2025 12:24:31 +0530
From: Shrikanth Hegde <sshegde@...ux.ibm.com>
To: Christophe Leroy <christophe.leroy@...roup.eu>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: mpe@...erman.id.au, maddy@...ux.ibm.com, linuxppc-dev@...ts.ozlabs.org,
npiggin@...il.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 1/1] powerpc: Enable dynamic preemption
On 1/31/25 11:39, Christophe Leroy wrote:
>
>
> Le 30/01/2025 à 21:26, Sebastian Andrzej Siewior a écrit :
>> On 2025-01-30 22:27:07 [+0530], Shrikanth Hegde wrote:
>>>> | #DEFINE need_irq_preemption() \
>>>> |
>>>> (static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
>>>> |
>>>> | if (need_irq_preemption()) {
>>>>
>>>> be a bit smaller/ quicker? This could be a fast path ;)
>>>
>>> I am okay with either way. I did try both[1], there wasn't any
>>> significant difference,
>>> hence chose a simpler one. May be system size, workload pattern might
>>> matter.
>>>
>>> Let me do some more testing to see which one wins.
>>> Is there any specific benchmark which might help here?
>>
>> No idea. As per bean counting: preempt_model_preemptible() should
>> resolve in two function calls + conditional in the dynamic case. This
>> should be more expensive compared to a nop/ branch ;)
>
> I did a build comparison on pmac32_defconfig + dynamic preemption, the
> static branch looks definitely more optimal:
>
> Without static branch:
>
> ...
> 294: 7c 08 02 a6 mflr r0
> 298: 90 01 00 24 stw r0,36(r1)
> 29c: 48 00 00 01 bl 29c <interrupt_exit_kernel_prepare+0x50>
> 29c: R_PPC_REL24 preempt_model_full
> 2a0: 2c 03 00 00 cmpwi r3,0
> 2a4: 41 82 00 a4 beq 348 <interrupt_exit_kernel_prepare+0xfc>
> ...
> 2a8: 81 22 00 4c lwz r9,76(r2)
> 2ac: 71 29 00 04 andi. r9,r9,4
> 2b0: 40 82 00 d4 bne 384 <interrupt_exit_kernel_prepare+0x138>
> ...
> 2b4: 7d 20 00 a6 mfmsr r9
> 2b8: 55 29 07 fa rlwinm r9,r9,0,31,29
> ...
> 348: 48 00 00 01 bl 348 <interrupt_exit_kernel_prepare+0xfc>
> 348: R_PPC_REL24 preempt_model_lazy
> 34c: 2c 03 00 00 cmpwi r3,0
> 350: 40 a2 ff 58 bne 2a8 <interrupt_exit_kernel_prepare+0x5c>
> 354: 4b ff ff 60 b 2b4 <interrupt_exit_kernel_prepare+0x68>
>
> With the static branch:
>
> ...
> 294: 48 00 00 58 b 2ec <interrupt_exit_kernel_prepare+0xa0>
> ...
> 298: 7d 20 00 a6 mfmsr r9
> 29c: 55 29 07 fa rlwinm r9,r9,0,31,29
> ...
> 2ec: 81 22 00 4c lwz r9,76(r2)
> 2f0: 71 29 00 04 andi. r9,r9,4
> 2f4: 41 82 ff a4 beq 298 <interrupt_exit_kernel_prepare+0x4c>
> ...
>
> With the static branch it is just an unconditional branch being later
> replaced by a NOP. It must be more efficient.
>
> So in first case we need to get and save LR, call preempt_model_full(),
> compare with 0, call preempt_model_lazy() and compare again, and at some
> point read and restore LR.
>
> And the preempt_model_() functions are not tiny functions:
>
> 00003620 <preempt_model_full>:
> 3620: 3d 20 00 00 lis r9,0
> 3622: R_PPC_ADDR16_HA preempt_dynamic_mode
> 3624: 94 21 ff f0 stwu r1,-16(r1)
> 3628: 80 69 00 00 lwz r3,0(r9)
> 362a: R_PPC_ADDR16_LO preempt_dynamic_mode
> 362c: 2c 03 ff ff cmpwi r3,-1
> 3630: 41 82 00 18 beq 3648 <preempt_model_full+0x28>
> 3634: 68 63 00 02 xori r3,r3,2
> 3638: 38 21 00 10 addi r1,r1,16
> 363c: 7c 63 00 34 cntlzw r3,r3
> 3640: 54 63 d9 7e srwi r3,r3,5
> 3644: 4e 80 00 20 blr
> ...
> 3648: 0f e0 00 00 twui r0,0
> 364c: 68 63 00 02 xori r3,r3,2
> 3650: 38 21 00 10 addi r1,r1,16
> 3654: 7c 63 00 34 cntlzw r3,r3
> 3658: 54 63 d9 7e srwi r3,r3,5
> 365c: 4e 80 00 20 blr
>
> 00003660 <preempt_model_lazy>:
> 3660: 3d 20 00 00 lis r9,0
> 3662: R_PPC_ADDR16_HA preempt_dynamic_mode
> 3664: 94 21 ff f0 stwu r1,-16(r1)
> 3668: 80 69 00 00 lwz r3,0(r9)
> 366a: R_PPC_ADDR16_LO preempt_dynamic_mode
> 366c: 2c 03 ff ff cmpwi r3,-1
> 3670: 41 82 00 18 beq 3688 <preempt_model_lazy+0x28>
> 3674: 68 63 00 03 xori r3,r3,3
> 3678: 38 21 00 10 addi r1,r1,16
> 367c: 7c 63 00 34 cntlzw r3,r3
> 3680: 54 63 d9 7e srwi r3,r3,5
> 3684: 4e 80 00 20 blr
> ...
> 3688: 0f e0 00 00 twui r0,0
> 368c: 68 63 00 03 xori r3,r3,3
> 3690: 38 21 00 10 addi r1,r1,16
> 3694: 7c 63 00 34 cntlzw r3,r3
> 3698: 54 63 d9 7e srwi r3,r3,5
> 369c: 4e 80 00 20 blr
>
>
>
That's convincing. Thanks for the pointers. I will move it into static
branch method.
That makes the code same for arm64/powerpc. Maybe its worth moving it
into sched/core. Let me try that as well.
PS: on leave next week, so will be sending the patches after i get back.
>> But you would still need preempt_model_preemptible() for the !DYN case.
>>
>>>>> + preempt_model_voluntary() ? "voluntary" :
>>>>> + preempt_model_full() ? "full" :
>>>>> + preempt_model_lazy() ? "lazy" :
>>>>> + "",
>>>>
>>>> So intend to rework this part. I have patches stashed at
>>>> https://eur01.safelinks.protection.outlook.com/?
>>>> url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Fbigeasy%2Fstaging.git%2Flog%2F%3Fh%3Dpreemption_string&data=05%7C02%7Cchristophe.leroy%40csgroup.eu%7C553a8f640a9c4514597d08dd416c6534%7C8b87af7d86474dc78df45f69a2011bb5%7C0%7C0%7C638738655988556429%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=ZfVQi1iRYaVCTrZ5vIhOQ7yAKaDnOwFi0NRjycfrI5A%3D&reserved=0
>>>>
>>>> which I didn't sent yet due to the merge window. Just a heads up ;)
>>>
>>> Makes sense. I had seen at-least two places where this code was
>>> there, ftrace/powerpc.
>>> There were way more places..
>>>
>>> You want me to remove this part?
>>
>> No, just be aware.
>> I don't know how this will be routed I guess we merge the sched pieces
>> first and then I submit the other pieces via the relevant maintainer
>> tree. In that case please be aware that all parts get removed/ replaced
>> properly.
>>
>> Sebastian
>
Powered by blists - more mailing lists