[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <19251b0c-a6b6-ba5b-3b7b-620416165121@redhat.com>
Date: Thu, 2 Nov 2023 15:07:14 -0400
From: Waiman Long <longman@...hat.com>
To: Yosry Ahmed <yosryahmed@...gle.com>
Cc: Tejun Heo <tj@...nel.org>, Zefan Li <lizefan.x@...edance.com>,
Johannes Weiner <hannes@...xchg.org>, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org, Joe Mario <jmario@...hat.com>,
Sebastian Jug <sejug@...hat.com>
Subject: Re: [PATCH v2] cgroup/rstat: Reduce cpu_lock hold time in
cgroup_rstat_flush_locked()
On 11/2/23 00:35, Yosry Ahmed wrote:
> On Wed, Nov 1, 2023 at 5:53 PM Waiman Long <longman@...hat.com> wrote:
>> When cgroup_rstat_updated() isn't being called concurrently with
>> cgroup_rstat_flush_locked(), its run time is pretty short. When
>> both are called concurrently, the cgroup_rstat_updated() run time
>> can spike to a pretty high value due to high cpu_lock hold time in
>> cgroup_rstat_flush_locked(). This can be problematic if the task calling
>> cgroup_rstat_updated() is a realtime task running on an isolated CPU
>> with a strict latency requirement. The cgroup_rstat_updated() call can
>> happens when there is a page fault even though the task is running in
> s/happens/happen
>
>> user space most of the time.
>>
>> The percpu cpu_lock is used to protect the update tree -
>> updated_next and updated_children. This protection is only needed
>> when cgroup_rstat_cpu_pop_updated() is being called. The subsequent
>> flushing operation which can take a much longer time does not need
>> that protection.
> nit: add: as it is already protected by cgroup_rstat_lock.
>
>> To reduce the cpu_lock hold time, we need to perform all the
>> cgroup_rstat_cpu_pop_updated() calls up front with the lock
>> released afterward before doing any flushing. This patch adds a new
>> cgroup_rstat_updated_list() function to return a singly linked list of
>> cgroups to be flushed.
>>
>> By adding some instrumentation code to measure the maximum elapsed times
>> of the new cgroup_rstat_updated_list() function and each cpu iteration of
>> cgroup_rstat_updated_locked() around the old cpu_lock lock/unlock pair
>> on a 2-socket x86-64 server running parallel kernel build, the maximum
>> elapsed times are 27us and 88us respectively. The maximum cpu_lock hold
>> time is now reduced to about 30% of the original.
>>
>> Below were the run time distribution of cgroup_rstat_updated_list()
>> during the same period:
>>
>> Run time Count
>> -------- -----
>> t <= 1us 12,574,302
>> 1us < t <= 5us 2,127,482
>> 5us < t <= 10us 8,445
>> 10us < t <= 20us 6,425
>> 20us < t <= 30us 50
>>
>> Signed-off-by: Waiman Long <longman@...hat.com>
> LGTM with some nits.
>
> Reviewed-by: Yosry Ahmed <yosryahmed@...gle.com>
>
>> ---
>> include/linux/cgroup-defs.h | 6 +++++
>> kernel/cgroup/rstat.c | 45 ++++++++++++++++++++++++-------------
>> 2 files changed, 36 insertions(+), 15 deletions(-)
>>
>> diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
>> index 265da00a1a8b..daaf6d4eb8b6 100644
>> --- a/include/linux/cgroup-defs.h
>> +++ b/include/linux/cgroup-defs.h
>> @@ -491,6 +491,12 @@ struct cgroup {
>> struct cgroup_rstat_cpu __percpu *rstat_cpu;
>> struct list_head rstat_css_list;
>>
>> + /*
>> + * A singly-linked list of cgroup structures to be rstat flushed.
>> + * Protected by cgroup_rstat_lock.
> Do you think we should mention that this is a scratch area for
> cgroup_rstat_flush_locked()? IOW, this field will be invalid or may
> contain garbage otherwise.
I can certainly add that into the comment.
>
> It might be also useful to mention that the scope of usage for this is
> for each percpu flushing iteration. The cgroup_rstat_lock can be
> dropped between percpu flushing iterations, so different flushers can
> reuse this field safely because it is re-initialized in every
> iteration and only used there.
>
>> + */
>> + struct cgroup *rstat_flush_next;
>> +
>> /* cgroup basic resource statistics */
>> struct cgroup_base_stat last_bstat;
>> struct cgroup_base_stat bstat;
>> diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
>> index d80d7a608141..a86d40ed8bda 100644
>> --- a/kernel/cgroup/rstat.c
>> +++ b/kernel/cgroup/rstat.c
>> @@ -145,6 +145,34 @@ static struct cgroup *cgroup_rstat_cpu_pop_updated(struct cgroup *pos,
>> return pos;
>> }
>>
>> +/*
>> + * Return a list of updated cgroups to be flushed
>> + */
> Why not just on a single line?
> /* Return a list of updated cgroups to be flushed */
Yes, it can be compressed into a one liner.
Thanks for the review and suggestion.
Cheers,
Longman
Powered by blists - more mailing lists