[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e7e2126f-40ca-44af-9287-888f4ec34b35@linux.ibm.com>
Date: Tue, 18 Jun 2024 23:57:02 +0530
From: Shrikanth Hegde <sshegde@...ux.ibm.com>
To: Ankur Arora <ankur.a.arora@...cle.com>
Cc: tglx@...utronix.de, peterz@...radead.org, torvalds@...ux-foundation.org,
paulmck@...nel.org, rostedt@...dmis.org, mark.rutland@....com,
juri.lelli@...hat.com, joel@...lfernandes.org, raghavendra.kt@....com,
boris.ostrovsky@...cle.com, konrad.wilk@...cle.com,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 00/35] PREEMPT_AUTO: support lazy rescheduling
On 6/15/24 8:34 PM, Shrikanth Hegde wrote:
>
>
> On 6/10/24 12:53 PM, Ankur Arora wrote:
>>
> _auto.
>>>
>>> 6.10-rc1:
>>> =========
>>> 10:09:50 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
>>> 09:45:23 AM all 4.14 0.00 77.57 0.00 16.92 0.00 0.00 0.00 0.00 1.37
>>> 09:45:24 AM all 4.42 0.00 77.62 0.00 16.76 0.00 0.00 0.00 0.00 1.20
>>> 09:45:25 AM all 4.43 0.00 77.45 0.00 16.94 0.00 0.00 0.00 0.00 1.18
>>> 09:45:26 AM all 4.45 0.00 77.87 0.00 16.68 0.00 0.00 0.00 0.00 0.99
>>>
>>> PREEMPT_AUTO:
>>> ===========
>>> 10:09:50 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
>>> 10:09:56 AM all 3.11 0.00 72.59 0.00 21.34 0.00 0.00 0.00 0.00 2.96
>>> 10:09:57 AM all 3.31 0.00 73.10 0.00 20.99 0.00 0.00 0.00 0.00 2.60
>>> 10:09:58 AM all 3.40 0.00 72.83 0.00 20.85 0.00 0.00 0.00 0.00 2.92
>>> 10:10:00 AM all 3.21 0.00 72.87 0.00 21.19 0.00 0.00 0.00 0.00 2.73
>>> 10:10:01 AM all 3.02 0.00 72.18 0.00 21.08 0.00 0.00 0.00 0.00 3.71
>>>
>>> Used bcc tools hardirq and softirq to see if irq are increasing. softirq implied there are more
>>> timer,sched softirq. Numbers vary between different samples, but trend seems to be similar.
>>
>> Yeah, the %sys is lower and %irq, higher. Can you also see where the
>> increased %irq is? For instance are the resched IPIs numbers greater?
>
> Hi Ankur,
>
>
> Used mpstat -I ALL to capture this info for 20 seconds.
>
> HARDIRQ per second:
> ===================
> 6.10:
> ===================
> 18 19 22 23 48 49 50 51 LOC BCT LOC2 SPU PMI MCE NMI WDG DBL
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 417956.86 1114642.30 1712683.65 2058664.99 0.00 0.00 18.30 0.39 31978.37 0.00 0.35 351.98 0.00 0.00 0.00 6405.54 329189.45
>
> Preempt_auto:
> ===================
> 18 19 22 23 48 49 50 51 LOC BCT LOC2 SPU PMI MCE NMI WDG DBL
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 609509.69 1910413.99 1923503.52 2061876.33 0.00 0.00 19.14 0.30 31916.59 0.00 0.45 497.88 0.00 0.00 0.00 6825.49 88247.85
>
> 18,19,22,23 are called XIVE interrupts. These are IPI interrupts. I am not sure which type of IPI are these. will have to see why its increasing.
>
>
> SOFTIRQ per second:
> ===================
> 6.10:
> ===================
> HI TIMER NET_TX NET_RX BLOCK IRQ_POLL TASKLET SCHED HRTIMER RCU
> 0.00 3966.47 0.00 18.25 0.59 0.00 0.34 12811.00 0.00 9693.95
>
> Preempt_auto:
> ===================
> HI TIMER NET_TX NET_RX BLOCK IRQ_POLL TASKLET SCHED HRTIMER RCU
> 0.00 4871.67 0.00 18.94 0.40 0.00 0.25 13518.66 0.00 15732.77
>
> Note: RCU softirq seems to increase significantly. Not sure which one triggers. still trying to figure out why.
> It maybe irq triggering to softirq or softirq causing more IPI.
>
>
>
> Also, Noticed a below config difference which gets removed in preempt auto. This happens because PREEMPTION make them as N. Made the changes in kernel/Kconfig.locks to get them
> enabled. I still see the same regression in hackbench. These configs still may need attention?
>
> 6.10 | preempt auto
> CONFIG_INLINE_SPIN_UNLOCK_IRQ=y | CONFIG_UNINLINE_SPIN_UNLOCK=y
> CONFIG_INLINE_READ_UNLOCK=y | ----------------------------------------------------------------------------
> CONFIG_INLINE_READ_UNLOCK_IRQ=y | ----------------------------------------------------------------------------
> CONFIG_INLINE_WRITE_UNLOCK=y | ----------------------------------------------------------------------------
> CONFIG_INLINE_WRITE_UNLOCK_IRQ=y | ----------------------------------------------------------------------------
>
>
Did an experiment keeping the number of CPU constant, while changing the number of sockets they span across.
When all CPU belong to same socket, there is no regression w.r.t to PREEMPT_AUTO. Regression starts when the CPUs start
spanning across sockets.
Since Preempt auto by default enables preempt count, I think that may cause the regression. I see Powerpc uses generic implementation
which may not scale well. Will try to shift to percpu based method and see. will get back if I can get that done successfully.
Powered by blists - more mailing lists