lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 18 Oct 2021 13:49:38 +0800
From:   Chengming Zhou <zhouchengming@...edance.com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:     Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        John Fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...nel.org>,
        Network Development <netdev@...r.kernel.org>,
        bpf <bpf@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [External] Re: [PATCH] bpf: use count for prealloc hashtab too

在 2021/10/16 上午3:58, Alexei Starovoitov 写道:
> On Fri, Oct 15, 2021 at 11:04 AM Chengming Zhou
> <zhouchengming@...edance.com> wrote:
>>
>> We only use count for kmalloc hashtab not for prealloc hashtab, because
>> __pcpu_freelist_pop() return NULL when no more elem in pcpu freelist.
>>
>> But the problem is that __pcpu_freelist_pop() will traverse all CPUs and
>> spin_lock for all CPUs to find there is no more elem at last.
>>
>> We encountered bad case on big system with 96 CPUs that alloc_htab_elem()
>> would last for 1ms. This patch use count for prealloc hashtab too,
>> avoid traverse and spin_lock for all CPUs in this case.
>>
>> Signed-off-by: Chengming Zhou <zhouchengming@...edance.com>
> 
> It's not clear from the commit log what you're solving.
> The atomic inc/dec in critical path of prealloc maps hurts performance.
> That's why it's not used.
> 
Thanks for the explanation, what I'm solving is when hash table hasn't free
elements, we don't need to call __pcpu_freelist_pop() to traverse and
spin_lock all CPUs. The ftrace output of this bad case is below:

 50)               |  htab_map_update_elem() {
 50)   0.329 us    |    _raw_spin_lock_irqsave();
 50)   0.063 us    |    lookup_elem_raw();
 50)               |    alloc_htab_elem() {
 50)               |      pcpu_freelist_pop() {
 50)   0.209 us    |        _raw_spin_lock();
 50)   0.264 us    |        _raw_spin_lock();
 50)   0.231 us    |        _raw_spin_lock();
 50)   0.168 us    |        _raw_spin_lock();
 50)   0.168 us    |        _raw_spin_lock();
 50)   0.300 us    |        _raw_spin_lock();
 50)   0.263 us    |        _raw_spin_lock();
 50)   0.304 us    |        _raw_spin_lock();
 50)   0.168 us    |        _raw_spin_lock();
 50)   0.177 us    |        _raw_spin_lock();
 50)   0.235 us    |        _raw_spin_lock();
 50)   0.162 us    |        _raw_spin_lock();
 50)   0.186 us    |        _raw_spin_lock();
 50)   0.185 us    |        _raw_spin_lock();
 50)   0.315 us    |        _raw_spin_lock();
 50)   0.172 us    |        _raw_spin_lock();
 50)   0.180 us    |        _raw_spin_lock();
 50)   0.173 us    |        _raw_spin_lock();
 50)   0.176 us    |        _raw_spin_lock();
 50)   0.261 us    |        _raw_spin_lock();
 50)   0.364 us    |        _raw_spin_lock();
 50)   0.180 us    |        _raw_spin_lock();
 50)   0.284 us    |        _raw_spin_lock();
 50)   0.226 us    |        _raw_spin_lock();
 50)   0.210 us    |        _raw_spin_lock();
 50)   0.237 us    |        _raw_spin_lock();
 50)   0.333 us    |        _raw_spin_lock();
 50)   0.295 us    |        _raw_spin_lock();
 50)   0.278 us    |        _raw_spin_lock();
 50)   0.260 us    |        _raw_spin_lock();
 50)   0.224 us    |        _raw_spin_lock();
 50)   0.447 us    |        _raw_spin_lock();
 50)   0.221 us    |        _raw_spin_lock();
 50)   0.320 us    |        _raw_spin_lock();
 50)   0.203 us    |        _raw_spin_lock();
 50)   0.213 us    |        _raw_spin_lock();
 50)   0.242 us    |        _raw_spin_lock();
 50)   0.230 us    |        _raw_spin_lock();
 50)   0.216 us    |        _raw_spin_lock();
 50)   0.525 us    |        _raw_spin_lock();
 50)   0.257 us    |        _raw_spin_lock();
 50)   0.235 us    |        _raw_spin_lock();
 50)   0.269 us    |        _raw_spin_lock();
 50)   0.368 us    |        _raw_spin_lock();
 50)   0.249 us    |        _raw_spin_lock();
 50)   0.217 us    |        _raw_spin_lock();
 50)   0.174 us    |        _raw_spin_lock();
 50)   0.173 us    |        _raw_spin_lock();
 50)   0.161 us    |        _raw_spin_lock();
 50)   0.282 us    |        _raw_spin_lock();
 50)   0.264 us    |        _raw_spin_lock();
 50)   0.160 us    |        _raw_spin_lock();
 50)   0.692 us    |        _raw_spin_lock();
 50)   0.185 us    |        _raw_spin_lock();
 50)   0.157 us    |        _raw_spin_lock();
 50)   0.168 us    |        _raw_spin_lock();
 50)   0.205 us    |        _raw_spin_lock();
 50)   0.189 us    |        _raw_spin_lock();
 50)   0.276 us    |        _raw_spin_lock();
 50)   0.171 us    |        _raw_spin_lock();
 50)   0.390 us    |        _raw_spin_lock();
 50)   0.164 us    |        _raw_spin_lock();
 50)   0.170 us    |        _raw_spin_lock();
 50)   0.188 us    |        _raw_spin_lock();
 50)   0.284 us    |        _raw_spin_lock();
 50)   0.191 us    |        _raw_spin_lock();
 50)   0.412 us    |        _raw_spin_lock();
 50)   0.285 us    |        _raw_spin_lock();
 50)   0.296 us    |        _raw_spin_lock();
 50)   0.315 us    |        _raw_spin_lock();
 50)   0.239 us    |        _raw_spin_lock();
 50)   0.225 us    |        _raw_spin_lock();
 50)   0.258 us    |        _raw_spin_lock();
 50)   0.228 us    |        _raw_spin_lock();
 50)   0.240 us    |        _raw_spin_lock();
 50)   0.297 us    |        _raw_spin_lock();
 50)   0.216 us    |        _raw_spin_lock();
 50)   0.213 us    |        _raw_spin_lock();
 50)   0.225 us    |        _raw_spin_lock();
 50)   0.223 us    |        _raw_spin_lock();
 50)   0.287 us    |        _raw_spin_lock();
 50)   0.258 us    |        _raw_spin_lock();
 50)   0.295 us    |        _raw_spin_lock();
 50)   0.262 us    |        _raw_spin_lock();
 50)   0.325 us    |        _raw_spin_lock();
 50)   0.203 us    |        _raw_spin_lock();
 50)   0.325 us    |        _raw_spin_lock();
 50)   0.255 us    |        _raw_spin_lock();
 50)   0.325 us    |        _raw_spin_lock();
 50)   0.216 us    |        _raw_spin_lock();
 50)   0.232 us    |        _raw_spin_lock();
 50)   0.804 us    |        _raw_spin_lock();
 50)   0.262 us    |        _raw_spin_lock();
 50)   0.242 us    |        _raw_spin_lock();
 50)   0.271 us    |        _raw_spin_lock();
 50)   0.175 us    |        _raw_spin_lock();
 50) + 61.026 us   |      }
 50) + 61.575 us   |    }
 50)   0.051 us    |    _raw_spin_unlock_irqrestore();
 50) + 64.863 us   |  }

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ