[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6d7246b6-195e-ee08-06b1-2d1ec722e7b2@bytedance.com>
Date: Mon, 18 Oct 2021 13:49:38 +0800
From: Chengming Zhou <zhouchengming@...edance.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <kafai@...com>,
Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>,
Network Development <netdev@...r.kernel.org>,
bpf <bpf@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [External] Re: [PATCH] bpf: use count for prealloc hashtab too
在 2021/10/16 上午3:58, Alexei Starovoitov 写道:
> On Fri, Oct 15, 2021 at 11:04 AM Chengming Zhou
> <zhouchengming@...edance.com> wrote:
>>
>> We only use count for kmalloc hashtab not for prealloc hashtab, because
>> __pcpu_freelist_pop() return NULL when no more elem in pcpu freelist.
>>
>> But the problem is that __pcpu_freelist_pop() will traverse all CPUs and
>> spin_lock for all CPUs to find there is no more elem at last.
>>
>> We encountered bad case on big system with 96 CPUs that alloc_htab_elem()
>> would last for 1ms. This patch use count for prealloc hashtab too,
>> avoid traverse and spin_lock for all CPUs in this case.
>>
>> Signed-off-by: Chengming Zhou <zhouchengming@...edance.com>
>
> It's not clear from the commit log what you're solving.
> The atomic inc/dec in critical path of prealloc maps hurts performance.
> That's why it's not used.
>
Thanks for the explanation, what I'm solving is when hash table hasn't free
elements, we don't need to call __pcpu_freelist_pop() to traverse and
spin_lock all CPUs. The ftrace output of this bad case is below:
50) | htab_map_update_elem() {
50) 0.329 us | _raw_spin_lock_irqsave();
50) 0.063 us | lookup_elem_raw();
50) | alloc_htab_elem() {
50) | pcpu_freelist_pop() {
50) 0.209 us | _raw_spin_lock();
50) 0.264 us | _raw_spin_lock();
50) 0.231 us | _raw_spin_lock();
50) 0.168 us | _raw_spin_lock();
50) 0.168 us | _raw_spin_lock();
50) 0.300 us | _raw_spin_lock();
50) 0.263 us | _raw_spin_lock();
50) 0.304 us | _raw_spin_lock();
50) 0.168 us | _raw_spin_lock();
50) 0.177 us | _raw_spin_lock();
50) 0.235 us | _raw_spin_lock();
50) 0.162 us | _raw_spin_lock();
50) 0.186 us | _raw_spin_lock();
50) 0.185 us | _raw_spin_lock();
50) 0.315 us | _raw_spin_lock();
50) 0.172 us | _raw_spin_lock();
50) 0.180 us | _raw_spin_lock();
50) 0.173 us | _raw_spin_lock();
50) 0.176 us | _raw_spin_lock();
50) 0.261 us | _raw_spin_lock();
50) 0.364 us | _raw_spin_lock();
50) 0.180 us | _raw_spin_lock();
50) 0.284 us | _raw_spin_lock();
50) 0.226 us | _raw_spin_lock();
50) 0.210 us | _raw_spin_lock();
50) 0.237 us | _raw_spin_lock();
50) 0.333 us | _raw_spin_lock();
50) 0.295 us | _raw_spin_lock();
50) 0.278 us | _raw_spin_lock();
50) 0.260 us | _raw_spin_lock();
50) 0.224 us | _raw_spin_lock();
50) 0.447 us | _raw_spin_lock();
50) 0.221 us | _raw_spin_lock();
50) 0.320 us | _raw_spin_lock();
50) 0.203 us | _raw_spin_lock();
50) 0.213 us | _raw_spin_lock();
50) 0.242 us | _raw_spin_lock();
50) 0.230 us | _raw_spin_lock();
50) 0.216 us | _raw_spin_lock();
50) 0.525 us | _raw_spin_lock();
50) 0.257 us | _raw_spin_lock();
50) 0.235 us | _raw_spin_lock();
50) 0.269 us | _raw_spin_lock();
50) 0.368 us | _raw_spin_lock();
50) 0.249 us | _raw_spin_lock();
50) 0.217 us | _raw_spin_lock();
50) 0.174 us | _raw_spin_lock();
50) 0.173 us | _raw_spin_lock();
50) 0.161 us | _raw_spin_lock();
50) 0.282 us | _raw_spin_lock();
50) 0.264 us | _raw_spin_lock();
50) 0.160 us | _raw_spin_lock();
50) 0.692 us | _raw_spin_lock();
50) 0.185 us | _raw_spin_lock();
50) 0.157 us | _raw_spin_lock();
50) 0.168 us | _raw_spin_lock();
50) 0.205 us | _raw_spin_lock();
50) 0.189 us | _raw_spin_lock();
50) 0.276 us | _raw_spin_lock();
50) 0.171 us | _raw_spin_lock();
50) 0.390 us | _raw_spin_lock();
50) 0.164 us | _raw_spin_lock();
50) 0.170 us | _raw_spin_lock();
50) 0.188 us | _raw_spin_lock();
50) 0.284 us | _raw_spin_lock();
50) 0.191 us | _raw_spin_lock();
50) 0.412 us | _raw_spin_lock();
50) 0.285 us | _raw_spin_lock();
50) 0.296 us | _raw_spin_lock();
50) 0.315 us | _raw_spin_lock();
50) 0.239 us | _raw_spin_lock();
50) 0.225 us | _raw_spin_lock();
50) 0.258 us | _raw_spin_lock();
50) 0.228 us | _raw_spin_lock();
50) 0.240 us | _raw_spin_lock();
50) 0.297 us | _raw_spin_lock();
50) 0.216 us | _raw_spin_lock();
50) 0.213 us | _raw_spin_lock();
50) 0.225 us | _raw_spin_lock();
50) 0.223 us | _raw_spin_lock();
50) 0.287 us | _raw_spin_lock();
50) 0.258 us | _raw_spin_lock();
50) 0.295 us | _raw_spin_lock();
50) 0.262 us | _raw_spin_lock();
50) 0.325 us | _raw_spin_lock();
50) 0.203 us | _raw_spin_lock();
50) 0.325 us | _raw_spin_lock();
50) 0.255 us | _raw_spin_lock();
50) 0.325 us | _raw_spin_lock();
50) 0.216 us | _raw_spin_lock();
50) 0.232 us | _raw_spin_lock();
50) 0.804 us | _raw_spin_lock();
50) 0.262 us | _raw_spin_lock();
50) 0.242 us | _raw_spin_lock();
50) 0.271 us | _raw_spin_lock();
50) 0.175 us | _raw_spin_lock();
50) + 61.026 us | }
50) + 61.575 us | }
50) 0.051 us | _raw_spin_unlock_irqrestore();
50) + 64.863 us | }
Powered by blists - more mailing lists