[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtDxhjh0LsjgTwKhMMtFqhyDW6qtU-=9K1p-fCR6YLjxCQ@mail.gmail.com>
Date: Wed, 15 May 2019 12:18:37 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Song Liu <songliubraving@...com>
Cc: Morten Rasmussen <morten.rasmussen@....com>,
linux-kernel <linux-kernel@...r.kernel.org>,
"cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
"mingo@...hat.com" <mingo@...hat.com>,
"peterz@...radead.org" <peterz@...radead.org>,
"tglx@...utronix.de" <tglx@...utronix.de>,
Kernel Team <Kernel-team@...com>,
viresh kumar <viresh.kumar@...aro.org>
Subject: Re: [PATCH 0/7] introduce cpu.headroom knob to cpu controller
Hi Song,
On Tue, 14 May 2019 at 22:58, Song Liu <songliubraving@...com> wrote:
>
> Hi Vincent,
>
[snip]
> >
> > Here are some more results with both Viresh's patch and the cpu.headroom
> > set. In these tests, the side job runs with SCHED_IDLE, so we get benefit
> > of Viresh's patch.
> >
> > We collected another metric here, average "cpu time" used by the requests.
> > We also presented "wall time" and "wall - cpu" time. "wall time" is the
> > same as "latency" in previous results. Basically, "wall time" includes cpu
> > time, scheduling latency, and time spent waiting for data (from data base,
> > memcache, etc.). We don't have good data that separates scheduling latency
> > and time spent waiting for data, so we present "wall - cpu" time, which is
> > the sum of the two. Time spent waiting for data should not change in these
> > tests, so changes in "wall - cpu" mostly comes from scheduling latency.
> > All the latency numbers are normalized based on the "wall time" of the
> > first row.
> >
> > side job | cpu.headroom | cpu-idle | wall time | cpu time | wall - cpu
> > ------------------------------------------------------------------------
> > none | n/a | 42.4% | 1.00 | 0.31 | 0.69
> > ffmpeg | 0 | 10.8% | 1.17 | 0.38 | 0.79
> > ffmpeg | 25% | 22.8% | 1.08 | 0.35 | 0.73
> >
> > From these results, we can see that Viresh's patch reduces the latency
> > overhead of the side job, from 42% (in previous results) to 17%. And
> > a 25% cpu.headroom further reduces the latency overhead to 8%.
> > cpu.headroom reduces time spent in "cpu time" and "wall - cpu" time,
> > which means cpu.headroom yields better IPC and lower scheduling latency.
> >
> > I think these data demonstrate that
> >
> > 1. Viresh's work is helpful in reducing scheduling latency introduced
> > by SCHED_IDLE side jobs.
> > 2. cpu.headroom work provides mechanism to further reduce scheduling
> > latency on top of Viresh's work.
> >
> > Therefore, the combination of the two work would give us mechanisms to
> > control the latency overhead of side workloads.
> >
> > @Vincent, do these data and analysis make sense from your point of view?
>
> Do you have further questions/concerns with this set?
Viresh's patchset takes into account CPU running only sched_idle task
only for the fast wakeup path. But nothing special is (yet) done for
the slow path or during idle load balance.
The histogram that you provided for "Fallback to sched-idle CPU for
better performance", shows that even if we have significantly reduced
the long wakeup latency, there are still some wakeup latency evenly
distributed in the range [16us-2msec]. Such values are most probably
because of sched_other task doesn't always preempt sched_idle task and
sometime waits for the next tick. This means that there are still
margin for improving the results with sched_idle without adding a new
knob.
The headroom knob forces cpus to be idle from time to time and the
scheduler fallbacks to the normal scheduling policy that tries to fill
idle CPU in this case. I'm still not convinced that most of the
increase of the latency increase is linked to contention when
accessing shared resources.
>
> As the data shown, scheduling latency is not the only resource of high
> latency here. In fact, with hyper threading and other shared system
> resources (cache, memory, etc.), side workload would always negatively
> impact the latency of the main workload. It is impossible to eliminate
> these impacts with scheduler optimizations. On the other hand,
> cpu.headroom provides mechanism to limit such impact.
>
> Optimization and protection are two sides of the problem. While we
> spend a lot of time optimizing the workload (so Viresh's work is really
> interesting for us), cpu.headroom works on the protection side. There
> are multiple reasons behind the high latencies. cpu.headroom provides
> universal protection against all these.
>
> With the protection of cpu.headroom, we can actually do optimizations
> more efficiently, as we can safely start with a high headroom, and
> then try to lower it.
>
> Please let me know your thoughts on this.
>
> Thanks,
> Song
>
>
>
Powered by blists - more mailing lists