lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 8 Jul 2023 15:00:25 +0800
From: Hou Tao <houtao@...weicloud.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Tejun Heo <tj@...nel.org>, rcu@...r.kernel.org,
 Network Development <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
 Kernel Team <kernel-team@...com>, Daniel Borkmann <daniel@...earbox.net>,
 Andrii Nakryiko <andrii@...nel.org>, David Vernet <void@...ifault.com>,
 "Paul E. McKenney" <paulmck@...nel.org>
Subject: Re: [PATCH v4 bpf-next 12/14] bpf: Introduce bpf_mem_free_rcu()
 similar to kfree_rcu().

Hi,

On 7/7/2023 12:05 PM, Hou Tao wrote:
> Hi,
>
> On 7/7/2023 10:10 AM, Alexei Starovoitov wrote:
>> On Thu, Jul 6, 2023 at 6:45 PM Hou Tao <houtao@...weicloud.com> wrote:
>>>
>>> On 7/6/2023 11:34 AM, Alexei Starovoitov wrote:
>>>> From: Alexei Starovoitov <ast@...nel.org>
>>>>
>>>> Introduce bpf_mem_[cache_]free_rcu() similar to kfree_rcu().
>>>> Unlike bpf_mem_[cache_]free() that links objects for immediate reuse into
>>>> per-cpu free list the _rcu() flavor waits for RCU grace period and then moves
>>>> objects into free_by_rcu_ttrace list where they are waiting for RCU
>>>> task trace grace period to be freed into slab.
>>>>
>>>> The life cycle of objects:
>>>> alloc: dequeue free_llist
>>>> free: enqeueu free_llist
>>>> free_rcu: enqueue free_by_rcu -> waiting_for_gp
>>>> free_llist above high watermark -> free_by_rcu_ttrace
>>>> after RCU GP waiting_for_gp -> free_by_rcu_ttrace
>>>> free_by_rcu_ttrace -> waiting_for_gp_ttrace -> slab
>>>>
>>>> Signed-off-by: Alexei Starovoitov <ast@...nel.org>
>>> Acked-by: Hou Tao <houtao1@...wei.com>
>> Thank you very much for code reviews and feedback.
> You are welcome. I also learn a lot from this great patch set.
>
>> btw I still believe that ABA is a non issue and prefer to keep the code as-is,
>> but for the sake of experiment I've converted it to spin_lock
>> (see attached patch which I think uglifies the code)
>> and performance across bench htab-mem and map_perf_test
>> seems to be about the same.
>> Which was a bit surprising to me.
>> Could you please benchmark it on your system?
> Will do that later. It seems if there is no cross-CPU allocation and
> free, the only possible contention is between __free_rcu() on CPU x and
> alloc_bulk()/free_bulk() on a different CPU.
>
For my local VM setup, the spin-lock also doesn't make much different
under both htab-mem and map_perf_test as shown below.

without spin-lock

normal bpf ma
=============
overwrite            per-prod-op: 54.16 ± 0.79k/s, avg mem: 159.99 ±
40.80MiB, peak mem: 251.41MiB
batch_add_batch_del  per-prod-op: 83.87 ± 0.86k/s, avg mem: 70.52 ±
22.73MiB, peak mem: 121.31MiB
add_del_on_diff_cpu  per-prod-op: 25.98 ± 0.13k/s, avg mem: 17.88 ±
1.84MiB, peak mem: 22.86MiB

./map_perf_test 4 8 16384
0:hash_map_perf kmalloc 361532 events per sec
2:hash_map_perf kmalloc 352594 events per sec
6:hash_map_perf kmalloc 356007 events per sec
5:hash_map_perf kmalloc 354184 events per sec
3:hash_map_perf kmalloc 348720 events per sec
1:hash_map_perf kmalloc 346332 events per sec
7:hash_map_perf kmalloc 352126 events per sec
4:hash_map_perf kmalloc 339459 events per sec

with spin-lock

normal bpf ma
=============
overwrite            per-prod-op: 54.72 ± 0.96k/s, avg mem: 133.99 ±
34.04MiB, peak mem: 221.60MiB
batch_add_batch_del  per-prod-op: 82.90 ± 1.86k/s, avg mem: 55.91 ±
11.05MiB, peak mem: 103.82MiB
add_del_on_diff_cpu  per-prod-op: 26.75 ± 0.10k/s, avg mem: 18.55 ±
1.24MiB, peak mem: 23.11MiB

./map_perf_test 4 8 16384
1:hash_map_perf kmalloc 361750 events per sec
2:hash_map_perf kmalloc 360976 events per sec
6:hash_map_perf kmalloc 361745 events per sec
0:hash_map_perf kmalloc 350349 events per sec
7:hash_map_perf kmalloc 359125 events per sec
3:hash_map_perf kmalloc 352683 events per sec
5:hash_map_perf kmalloc 350897 events per sec
4:hash_map_perf kmalloc 331215 events per sec


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ