[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5df1fa23-1a17-d2e8-7a3a-0a44478bc1de@redhat.com>
Date: Wed, 1 Nov 2023 18:03:46 -0400
From: Waiman Long <longman@...hat.com>
To: Yosry Ahmed <yosryahmed@...gle.com>
Cc: Tejun Heo <tj@...nel.org>, Zefan Li <lizefan.x@...edance.com>,
Johannes Weiner <hannes@...xchg.org>, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org, Joe Mario <jmario@...hat.com>,
Sebastian Jug <sejug@...hat.com>
Subject: Re: [PATCH] cgroup/rstat: Reduce cpu_lock hold time in
cgroup_rstat_flush_locked()
On 11/1/23 15:11, Yosry Ahmed wrote:
> On Wed, Nov 1, 2023 at 9:09 AM Waiman Long <longman@...hat.com> wrote:
>> When cgroup_rstat_updated() isn't being called concurrently with
>> cgroup_rstat_flush_locked(), its run time is pretty short. When
>> both are called concurrently, the cgroup_rstat_updated() run time
>> can spike to a pretty high value due to high cpu_lock hold time in
>> cgroup_rstat_flush_locked(). This can be problematic if the task calling
>> cgroup_rstat_updated() is a realtime task running on an isolated CPU
>> with a strict latency requirement. The cgroup_rstat_updated() call can
>> happens when there is a page fault even though the task is running in
>> user space most of the time.
>>
>> The percpu cpu_lock is used to protect the update tree -
>> updated_next and updated_children. This protection is only needed
>> when cgroup_rstat_cpu_pop_updated() is being called. The subsequent
>> flushing operation which can take a much longer time does not need
>> that protection.
>>
>> To reduce the cpu_lock hold time, we need to perform all the
>> cgroup_rstat_cpu_pop_updated() calls up front with the lock
>> released afterward before doing any flushing. This patch adds a new
>> cgroup_rstat_flush_list() function to do just that and return a singly
>> linked list of cgroup_rstat_cpu structures to be flushed.
>>
>> By adding some instrumentation code to measure the maximum elapsed times
>> of the new cgroup_rstat_flush_list() function and each cpu iteration
>> of cgroup_rstat_flush_locked() around the old cpu_lock lock/unlock pair
>> on a 2-socket x86-64 server running parallel kernel build, the maximum
>> elapsed times are 31us and 118us respectively. The maximum cpu_lock
>> hold time is now reduced to about 1/4 of the original.
> This sounds promising. It's smart to empty the whole tree while
> holding the lock, then do the flush only under cgroup_rstat_lock.
> Thanks for doing this.
>
>> Signed-off-by: Waiman Long <longman@...hat.com>
>> ---
>> include/linux/cgroup-defs.h | 7 +++++
>> kernel/cgroup/rstat.c | 57 +++++++++++++++++++++++++++----------
>> 2 files changed, 49 insertions(+), 15 deletions(-)
>>
>> diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
>> index 265da00a1a8b..22adb94ebb74 100644
>> --- a/include/linux/cgroup-defs.h
>> +++ b/include/linux/cgroup-defs.h
>> @@ -368,6 +368,13 @@ struct cgroup_rstat_cpu {
>> */
>> struct cgroup *updated_children; /* terminated by self cgroup */
>> struct cgroup *updated_next; /* NULL iff not on the list */
>> +
>> + /*
>> + * A singly-linked list of cgroup_rstat_cpu structures to be flushed.
>> + * Protected by cgroup_rstat_lock.
>> + */
>> + struct cgroup_rstat_cpu *flush_next;
>> + struct cgroup *cgroup; /* Cgroup back pointer */
> Why are we linking struct cgroup_rstat_cpu instead of directly linking
> struct cgroup? AFAICT we only ever use the cgroup back pointer during
> flushing anyway, right?
You are right.
> Given that only one cpu can be flushed at a time, I think it should be
> okay to run the list directly through struct cgroup, and have all cpus
> reuse it. That pointer would essentially be scratch space for the
> flushing code to use. This should also save a bit of memory:
> O(cgroups) vs O(cgroups * cpus). It's not a lot either way though.
>
> I think this may also simplify the code a bit.
Moving the flush_next pointer to struct cgroup does save a bit of
memory. Thanks for the suggestion. I will do that in the next version.
>
>> };
>>
>> struct cgroup_freezer_state {
>> diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
>> index d80d7a608141..93ef2795a68d 100644
>> --- a/kernel/cgroup/rstat.c
>> +++ b/kernel/cgroup/rstat.c
>> @@ -145,6 +145,42 @@ static struct cgroup *cgroup_rstat_cpu_pop_updated(struct cgroup *pos,
>> return pos;
>> }
>>
>> +/*
>> + * Return a list of cgroup_rstat_cpu structures to be flushed
>> + */
>> +static struct cgroup_rstat_cpu *cgroup_rstat_flush_list(struct cgroup *root,
> nit: the name sounds like the function will flush a list, rather than
> return a list of cgroups to be flushed. What about
> cgroup_rstat_updated_list?
I am not good at naming. cgroup_rstat_updated_list looks good to me.
Cheers,
Longman
Powered by blists - more mailing lists