[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <625ce99b-8ec3-f807-99ac-1dc32695deca@bytedance.com>
Date: Fri, 28 Oct 2022 18:21:11 +0800
From: Abel Wu <wuyun.abel@...edance.com>
To: Miaohe Lin <linmiaohe@...wei.com>,
"mingo@...hat.com" <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>, juri.lelli@...hat.com,
vincent.guittot@...aro.org, rohit.k.jain@...cle.com
Cc: dietmar.eggemann@....com, Steven Rostedt <rostedt@...dmis.org>,
bsegall@...gle.com, mgorman@...e.de, bristot@...hat.com,
vschneid@...hat.com, linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: Regression on vcpu_is_preempted()
Hi Miaohe,
On 10/28/22 4:48 PM, Miaohe Lin wrote:
> Hi all scheduler experts:
> When we run java gc in our 8 vcpus guest *without KVM_FEATURE_STEAL_TIME enabled*, the output looks like below:
> With ParallelGCThreads=4 and ConcGCThreads=4, we have:
> G1 Young Generation: 1 times 1786 ms
> G1 Old Generation: 1 times 1022 ms
> With ParallelGCThreads=5 and ConcGCThreads=5, we have:
> G1 Young Generation: 1 times 1557 ms
> G1 Old Generation: 1 times 1020 ms
>
> This meets our expectation. But *with KVM_FEATURE_STEAL_TIME enabled* in our guest, the output looks like this:
> With ParallelGCThreads=4 and ConcGCThreads=4, we have:
> G1 Young Generation: 1 times 1637 ms
> G1 Old Generation: 1 times 1022 ms
> With ParallelGCThreads=5 and ConcGCThreads=5, we have:
> G1 Young Generation: 1 times 2164 ms
> ^^^^
> G1 Old Generation: 1 times 1024 ms
>
> The duration of G1 Young Generation is far beyond our expectation when gc threads = 5. And we found the root cause
> is that when KVM_FEATURE_STEAL_TIME is enabled *there are much more(3k+) cpu migrations for java gc threads*. It's due to
> the below commit:
>
> commit 247f2f6f3c706b40b5f3886646f3eb53671258bf
> Author: Rohit Jain <rohit.k.jain@...cle.com>
> Date: Wed May 2 13:52:10 2018 -0700
>
> sched/core: Don't schedule threads on pre-empted vCPUs
>
> In paravirt configurations today, spinlocks figure out whether a vCPU is
> running to determine whether or not spinlock should bother spinning. We
> can use the same logic to prioritize CPUs when scheduling threads. If a
> vCPU has been pre-empted, it will incur the extra cost of VMENTER and
> the time it actually spends to be running on the host CPU. If we had
> other vCPUs which were actually running on the host CPU and idle we
> should schedule threads there.
>
> When scheduler tries to select a CPU to run the gc thread, available_idle_cpu() will check whether vcpu_is_preempted().
> It will choose other vcpu to run gc threads when the current vcpu is preempted. But the preempted vcpu has no other work
> to do except continuing to do gc. In our guest, there are more vcpus than java gc threads. So there could always be some
> available vcpus when scheduler tries to select a idle vcpu (runing on host). This leads to lots of cpu migrations and results
> in regression.
So you want the preempted idle cpus to run gc threads to maximize the
gc throughput, but available_idle_cpu() keeps them from being selected.
In theory, load balancing will help spreading load to these cpus (and
make them VMENTERed), so have you checked that the gc threads showed a
tendency to stack on same cpus?
Powered by blists - more mailing lists