linux-kernel - Re: [PATCH v2 00/35] PREEMPT_AUTO: support lazy rescheduling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e7e2126f-40ca-44af-9287-888f4ec34b35@linux.ibm.com>
Date: Tue, 18 Jun 2024 23:57:02 +0530
From: Shrikanth Hegde <sshegde@...ux.ibm.com>
To: Ankur Arora <ankur.a.arora@...cle.com>
Cc: tglx@...utronix.de, peterz@...radead.org, torvalds@...ux-foundation.org,
        paulmck@...nel.org, rostedt@...dmis.org, mark.rutland@....com,
        juri.lelli@...hat.com, joel@...lfernandes.org, raghavendra.kt@....com,
        boris.ostrovsky@...cle.com, konrad.wilk@...cle.com,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 00/35] PREEMPT_AUTO: support lazy rescheduling



On 6/15/24 8:34 PM, Shrikanth Hegde wrote:
> 
> 
> On 6/10/24 12:53 PM, Ankur Arora wrote:
>>
> _auto.
>>>
>>> 6.10-rc1:
>>> =========
>>> 10:09:50 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
>>> 09:45:23 AM  all    4.14    0.00   77.57    0.00   16.92    0.00    0.00    0.00    0.00    1.37
>>> 09:45:24 AM  all    4.42    0.00   77.62    0.00   16.76    0.00    0.00    0.00    0.00    1.20
>>> 09:45:25 AM  all    4.43    0.00   77.45    0.00   16.94    0.00    0.00    0.00    0.00    1.18
>>> 09:45:26 AM  all    4.45    0.00   77.87    0.00   16.68    0.00    0.00    0.00    0.00    0.99
>>>
>>> PREEMPT_AUTO:
>>> ===========
>>> 10:09:50 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
>>> 10:09:56 AM  all    3.11    0.00   72.59    0.00   21.34    0.00    0.00    0.00    0.00    2.96
>>> 10:09:57 AM  all    3.31    0.00   73.10    0.00   20.99    0.00    0.00    0.00    0.00    2.60
>>> 10:09:58 AM  all    3.40    0.00   72.83    0.00   20.85    0.00    0.00    0.00    0.00    2.92
>>> 10:10:00 AM  all    3.21    0.00   72.87    0.00   21.19    0.00    0.00    0.00    0.00    2.73
>>> 10:10:01 AM  all    3.02    0.00   72.18    0.00   21.08    0.00    0.00    0.00    0.00    3.71
>>>
>>> Used bcc tools hardirq and softirq to see if irq are increasing. softirq implied there are more
>>> timer,sched softirq. Numbers vary between different samples, but trend seems to be similar.
>>
>> Yeah, the %sys is lower and %irq, higher. Can you also see where the
>> increased %irq is? For instance are the resched IPIs numbers greater?
> 
> Hi Ankur,
> 
> 
> Used mpstat -I ALL to capture this info for 20 seconds. 
> 
> HARDIRQ per second:
> ===================
> 6.10:
> ===================
> 18		19		22		23		48	49	50	51	LOC		BCT	LOC2	SPU	PMI	MCE	NMI	WDG	DBL
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 417956.86	1114642.30	1712683.65	2058664.99	0.00	0.00	18.30	0.39	31978.37	0.00	0.35	351.98	0.00	0.00	0.00	6405.54	329189.45
> 
> Preempt_auto:
> ===================
> 18		19		22		23		48	49	50	51	LOC		BCT	LOC2	SPU	PMI	MCE	NMI	WDG	DBL
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 609509.69	1910413.99	1923503.52	2061876.33	0.00	0.00	19.14	0.30	31916.59	0.00	0.45	497.88	0.00	0.00	0.00	6825.49	88247.85
> 
> 18,19,22,23 are called XIVE interrupts. These are IPI interrupts. I am not sure which type of IPI are these. will have to see why its increasing. 
> 
> 
> SOFTIRQ per second:
> ===================
> 6.10:
> =================== 
> HI	TIMER	NET_TX	NET_RX	BLOCK	IRQ_POLL	TASKLET		SCHED		HRTIMER		RCU	
> 0.00	3966.47	0.00	18.25	0.59	0.00		0.34		12811.00	0.00		9693.95
> 
> Preempt_auto:
> ===================
> HI	TIMER	NET_TX	NET_RX	BLOCK	IRQ_POLL	TASKLET		SCHED		HRTIMER		RCU	
> 0.00	4871.67	0.00	18.94	0.40	0.00		0.25		13518.66	0.00		15732.77
> 
> Note: RCU softirq seems to increase significantly. Not sure which one triggers. still trying to figure out why. 
> It maybe irq triggering to softirq or softirq causing more IPI. 
> 
> 
> 
> Also, Noticed a below config difference which gets removed in preempt auto. This happens because PREEMPTION make them as N. Made the changes in kernel/Kconfig.locks to get them 
> enabled. I still see the same regression in hackbench. These configs still may need attention?
> 		
> 					6.10				       | 					preempt auto 
>   CONFIG_INLINE_SPIN_UNLOCK_IRQ=y                                              |  CONFIG_UNINLINE_SPIN_UNLOCK=y                                               
>   CONFIG_INLINE_READ_UNLOCK=y                                                  |  ----------------------------------------------------------------------------
>   CONFIG_INLINE_READ_UNLOCK_IRQ=y                                              |  ----------------------------------------------------------------------------
>   CONFIG_INLINE_WRITE_UNLOCK=y                                                 |  ----------------------------------------------------------------------------
>   CONFIG_INLINE_WRITE_UNLOCK_IRQ=y                                             |  ----------------------------------------------------------------------------
> 
> 

Did an experiment keeping the number of CPU constant, while changing the number of sockets they span across. 
When all CPU belong to same socket, there is no regression w.r.t to PREEMPT_AUTO. Regression starts when the CPUs start 
spanning across sockets. 

Since Preempt auto by default enables preempt count, I think that may cause the regression. I see Powerpc uses generic implementation
which may not scale well. Will try to shift to percpu based method and see. will get back if I can get that done successfully.