lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Thu, 27 Jun 2024 21:14:45 +0530
From: Shrikanth Hegde <sshegde@...ux.ibm.com>
To: Michael Ellerman <mpe@...erman.id.au>,
        Ankur Arora <ankur.a.arora@...cle.com>
Cc: tglx@...utronix.de, peterz@...radead.org, torvalds@...ux-foundation.org,
        paulmck@...nel.org, rostedt@...dmis.org, mark.rutland@....com,
        juri.lelli@...hat.com, joel@...lfernandes.org, raghavendra.kt@....com,
        boris.ostrovsky@...cle.com, konrad.wilk@...cle.com,
        LKML <linux-kernel@...r.kernel.org>,
        Nicholas Piggin <npiggin@...il.com>
Subject: Re: [PATCH v2 00/35] PREEMPT_AUTO: support lazy rescheduling



On 6/27/24 11:26 AM, Michael Ellerman wrote:
> Ankur Arora <ankur.a.arora@...cle.com> writes:
>> Shrikanth Hegde <sshegde@...ux.ibm.com> writes:
>>> ...
>>> This was the patch which I tried to make it per cpu for powerpc: It boots and runs workload.
>>> Implemented a simpler one instead of folding need resched into preempt count. By hacky way avoided
>>> tif_need_resched calls as didnt affect the throughput. Hence kept it simple. Below is the patch
>>> for reference. It didn't help fix the regression unless I implemented it wrongly.
>>>
>>> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
>>> index 1d58da946739..374642288061 100644
>>> --- a/arch/powerpc/include/asm/paca.h
>>> +++ b/arch/powerpc/include/asm/paca.h
>>> @@ -268,6 +268,7 @@ struct paca_struct {
>>>  	u16 slb_save_cache_ptr;
>>>  #endif
>>>  #endif /* CONFIG_PPC_BOOK3S_64 */
>>> +	int preempt_count;
>>
>> I don't know powerpc at all. But, would this cacheline be hotter
>> than current_thread_info()::preempt_count?
>>
>>>  #ifdef CONFIG_STACKPROTECTOR
>>>  	unsigned long canary;
>>>  #endif
> 
> Assuming stack protector is enabled (it is in defconfig), that cache
> line should quite be hot, because the canary is loaded as part of the
> epilogue of many functions.

Thanks Michael for taking a look at it.  

Yes. CONFIG_STACKPROTECTOR=y 
which cacheline is a question still if we are going to pursue this. 
> Putting preempt_count in the paca also means it's a single load/store to 
> access the value, just paca (in r13) + static offset. With the
> preempt_count in thread_info it's two loads, one to load current from
> the paca and then another to get the preempt_count.
> 
> It could be worthwhile to move preempt_count into the paca, but I'm not
> convinced preempt_count is accessed enough for it to be a major
> performance issue.

With PREEMPT_COUNT enabled, this would mean for every preempt_enable/disable. 
That means for every spin lock/unlock, get/set cpu etc. Those might be 
quite frequent. no? But w.r.t to preempt auto it didn't change the performance per se. 

> 
> cheers

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ