[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <02195130-3d9a-a206-d931-fab7dc606061@arm.com>
Date: Wed, 5 Aug 2020 10:50:29 +0200
From: Dietmar Eggemann <dietmar.eggemann@....com>
To: Valentin Schneider <valentin.schneider@....com>,
Thomas Gleixner <tglx@...utronix.de>
Cc: Vladimir Oltean <olteanv@...il.com>,
Kurt Kanzenbach <kurt.kanzenbach@...utronix.de>,
Alison Wang <alison.wang@....com>, catalin.marinas@....com,
will@...nel.org, paulmck@...nel.org, mw@...ihalf.com,
leoyang.li@....com, vladimir.oltean@....com,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
Anna-Maria Gleixner <anna-maria@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>
Subject: Re: [RFC PATCH] arm64: defconfig: Disable fine-grained task level IRQ
time accounting
On 04/08/2020 01:59, Valentin Schneider wrote:
>
> On 03/08/20 20:22, Thomas Gleixner wrote:
>> Valentin,
>>
>> Valentin Schneider <valentin.schneider@....com> writes:
>>> On 03/08/20 16:13, Thomas Gleixner wrote:
>>>> Vladimir Oltean <olteanv@...il.com> writes:
>>>>>> 1) When irq accounting is disabled, RT throttling kicks in as
>>>>>> expected.
>>>>>>
>>>>>> 2) With irq accounting the RT throttler does not kick in and the RCU
>>>>>> stall/lockups happen.
>>>>> What is this telling us?
>>>>
>>>> It seems that the fine grained irq time accounting affects the runtime
>>>> accounting in some way which I haven't figured out yet.
>>>>
>>>
>>> With IRQ_TIME_ACCOUNTING, rq_clock_task() will always be incremented by a
>>> lesser-or-equal value than when not having the option; you start with the
>>> same delta_exec but slice some for the IRQ accounting, and leave the rest
>>> for the rq_clock_task() (+paravirt).
>>>
>>> IIUC this means that if you spend e.g. 10% of the time in IRQ and 90% of
>>> the time running the stress-ng RT tasks, despite having RT tasks hogging
>>> the entirety of the "available time" it is still only 90% runtime, which is
>>> below the 95% default and the throttling doesn't happen.
>>
>> totaltime = irqtime + tasktime
>>
>> Ignoring irqtime and pretending that totaltime is what the scheduler
>> can control and deal with is naive at best.
>>
>
> Agreed, however AFAICT rt_time is only incremented by rq_clock_task()
> deltas, which don't include IRQ time with IRQ_TIME_ACCOUNTING=y. That would
> then be directly compared to the sysctl runtime.
>
> Adding some prints in sched_rt_runtime_exceeded() and running this test
> case on my Juno, I get:
> # IRQ_TIME_ACCOUNTING=y
> cpu=2 rt_time=713455220 runtime=950000000 rq->avg_irq.util_avg=265
> (rt_time oscillates between [70.1e7, 75.1e7]; avg_irq between [220, 270])
>
> # IRQ_TIME_ACCOUNTING=n
> cpu=2 rt_time=963035300 runtime=949951811
> (rt_time oscillates between [94.1e7, 96.1e7];
>
> Throttling happens for IRQ_TIME_ACCOUNTING=n and doesn't for
> IRQ_TIME_ACCOUNTING=y - clearly the accounted rt_time isn't high enough for
> that to happen, and it does look like what is missing in rt_time (or what
> should be subtracted from the available runtime) is there in the avg_irq.
I agree that w/ IRQ_TIME_ACCOUNTING=y rt_rq->rt_time isn't high enough
in this testcase.
stress-ng-hrtim-1655 [001] 462.897733: bprint: update_curr_rt:
rt_rq->rt_time=416716900 rt_rq->rt_runtime=950000000
rt_b->rt_runtime=950000000
The 5% reservation (1 - sched_rt_runtime_us/sched_rt_period_us) for CFS
is massively eclipsed by irqtime.
It's true that avg_irq tracks 'irq_delta + steal' time but it is meant
to potentially reduce cpu capacity. It's also cpu and frequency
invariant (your CPU2 is a big CPU so no issue here).
Could a rq_clock(rq) derived rt_rq signal been used to compare against
rt_runtime?
BTW, DL already influences rt_rq->rt_time.
[...]
Powered by blists - more mailing lists