[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e8efdade-161e-4efe-8bd3-abb12ad45dee@linux.ibm.com>
Date: Thu, 27 Jun 2024 21:14:45 +0530
From: Shrikanth Hegde <sshegde@...ux.ibm.com>
To: Michael Ellerman <mpe@...erman.id.au>,
Ankur Arora <ankur.a.arora@...cle.com>
Cc: tglx@...utronix.de, peterz@...radead.org, torvalds@...ux-foundation.org,
paulmck@...nel.org, rostedt@...dmis.org, mark.rutland@....com,
juri.lelli@...hat.com, joel@...lfernandes.org, raghavendra.kt@....com,
boris.ostrovsky@...cle.com, konrad.wilk@...cle.com,
LKML <linux-kernel@...r.kernel.org>,
Nicholas Piggin <npiggin@...il.com>
Subject: Re: [PATCH v2 00/35] PREEMPT_AUTO: support lazy rescheduling
On 6/27/24 11:26 AM, Michael Ellerman wrote:
> Ankur Arora <ankur.a.arora@...cle.com> writes:
>> Shrikanth Hegde <sshegde@...ux.ibm.com> writes:
>>> ...
>>> This was the patch which I tried to make it per cpu for powerpc: It boots and runs workload.
>>> Implemented a simpler one instead of folding need resched into preempt count. By hacky way avoided
>>> tif_need_resched calls as didnt affect the throughput. Hence kept it simple. Below is the patch
>>> for reference. It didn't help fix the regression unless I implemented it wrongly.
>>>
>>> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
>>> index 1d58da946739..374642288061 100644
>>> --- a/arch/powerpc/include/asm/paca.h
>>> +++ b/arch/powerpc/include/asm/paca.h
>>> @@ -268,6 +268,7 @@ struct paca_struct {
>>> u16 slb_save_cache_ptr;
>>> #endif
>>> #endif /* CONFIG_PPC_BOOK3S_64 */
>>> + int preempt_count;
>>
>> I don't know powerpc at all. But, would this cacheline be hotter
>> than current_thread_info()::preempt_count?
>>
>>> #ifdef CONFIG_STACKPROTECTOR
>>> unsigned long canary;
>>> #endif
>
> Assuming stack protector is enabled (it is in defconfig), that cache
> line should quite be hot, because the canary is loaded as part of the
> epilogue of many functions.
Thanks Michael for taking a look at it.
Yes. CONFIG_STACKPROTECTOR=y
which cacheline is a question still if we are going to pursue this.
> Putting preempt_count in the paca also means it's a single load/store to
> access the value, just paca (in r13) + static offset. With the
> preempt_count in thread_info it's two loads, one to load current from
> the paca and then another to get the preempt_count.
>
> It could be worthwhile to move preempt_count into the paca, but I'm not
> convinced preempt_count is accessed enough for it to be a major
> performance issue.
With PREEMPT_COUNT enabled, this would mean for every preempt_enable/disable.
That means for every spin lock/unlock, get/set cpu etc. Those might be
quite frequent. no? But w.r.t to preempt auto it didn't change the performance per se.
>
> cheers
Powered by blists - more mailing lists