lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <76995749-1c2e-4f78-9aac-a4bff4b8097f@huawei.com>
Date: Tue, 3 Dec 2024 21:45:09 +0800
From: Kefeng Wang <wangkefeng.wang@...wei.com>
To: Uladzislau Rezki <urezki@...il.com>
CC: zuoze <zuoze1@...wei.com>, Matthew Wilcox <willy@...radead.org>,
	<gustavoars@...nel.org>, <akpm@...ux-foundation.org>,
	<linux-hardening@...r.kernel.org>, <linux-mm@...ck.org>,
	<keescook@...omium.org>
Subject: Re: [PATCH -next] mm: usercopy: add a debugfs interface to bypass the
 vmalloc check.



On 2024/12/3 21:39, Uladzislau Rezki wrote:
> On Tue, Dec 03, 2024 at 09:30:09PM +0800, Kefeng Wang wrote:
>>
>>
>> On 2024/12/3 21:10, zuoze wrote:
>>>
>>>
>>> 在 2024/12/3 20:39, Uladzislau Rezki 写道:
>>>> On Tue, Dec 03, 2024 at 07:23:44PM +0800, zuoze wrote:
>>>>> We have implemented host-guest communication based on the TUN device
>>>>> using XSK[1]. The hardware is a Kunpeng 920 machine (ARM architecture),
>>>>> and the operating system is based on the 6.6 LTS version with kernel
>>>>> version 6.6. The specific stack for hotspot collection is as follows:
>>>>>
>>>>> -  100.00%     0.00%  vhost-12384  [unknown]      [k] 0000000000000000
>>>>>      - ret_from_fork
>>>>>         - 99.99% vhost_task_fn
>>>>>            - 99.98% 0xffffdc59f619876c
>>>>>               - 98.99% handle_rx_kick
>>>>>                  - 98.94% handle_rx
>>>>>                     - 94.92% tun_recvmsg
>>>>>                        - 94.76% tun_do_read
>>>>>                           - 94.62% tun_put_user_xdp_zc
>>>>>                              - 63.53% __check_object_size
>>>>>                                 - 63.49% __check_object_size.part.0
>>>>>                                      find_vmap_area
>>>>>                              - 30.02% _copy_to_iter
>>>>>                                   __arch_copy_to_user
>>>>>                     - 2.27% get_rx_bufs
>>>>>                        - 2.12% vhost_get_vq_desc
>>>>>                             1.49% __arch_copy_from_user
>>>>>                     - 0.89% peek_head_len
>>>>>                          0.54% xsk_tx_peek_desc
>>>>>                     - 0.68% vhost_add_used_and_signal_n
>>>>>                        - 0.53% eventfd_signal
>>>>>                             eventfd_signal_mask
>>>>>               - 0.94% handle_tx_kick
>>>>>                  - 0.94% handle_tx
>>>>>                     - handle_tx_copy
>>>>>                        - 0.59% vhost_tx_batch.constprop.0
>>>>>                             0.52% tun_sendmsg
>>>>>
>>>>> It can be observed that most of the overhead is concentrated in the
>>>>> find_vmap_area function.
>>>>>
>>>> I see. Yes, it is pretty contented, since you run the v6.6 kernel. There
>>>> was a work that tends to improve it to mitigate a vmap lock contention.
>>>> See it here: https://lwn.net/Articles/956590/
>>>>
>>>> The work was taken in the v6.9 kernel:
>>>>
>>>> <snip>
>>>> commit 38f6b9af04c4b79f81b3c2a0f76d1de94b78d7bc
>>>> Author: Uladzislau Rezki (Sony) <urezki@...il.com>
>>>> Date:   Tue Jan 2 19:46:23 2024 +0100
>>>>
>>>>       mm: vmalloc: add va_alloc() helper
>>>>
>>>>       Patch series "Mitigate a vmap lock contention", v3.
>>>>
>>>>       1. Motivation
>>>> ...
>>>> <snip>
>>>>
>>>> Could you please try the v6.9 kernel on your setup?
>>>>
>>>> How to solve it, probably, it can be back-ported to the v6.6 kernel.
>>>
>>> All the vmalloc-related optimizations have already been merged into 6.6,
>>> including the set of optimization patches you suggested. Thank you very
>>> much for your input.
>>>
>>
>> It is unclear, we have backported the vmalloc optimization into our 6.6
>> kernel before, so the above stack already with those patches and even
>> with those optimization, the find_vmap_area() is still the hotpots.
>>
>>
> Could you please check that all below patches are in your v6.6 kernel?

Yes,

$ git lg v6.6..HEAD  --oneline mm/vmalloc.c
* 86fee542f145 mm: vmalloc: ensure vmap_block is initialised before 
adding to queue
* f459a0b59f7c mm/vmalloc: fix page mapping if vm_area_alloc_pages() 
with high order fallback to order 0
* 0be7a82c2555 mm: vmalloc: fix lockdep warning
* 58b99a00d0a0 mm/vmalloc: eliminated the lock contention from twice to once
* 2c549aa32fa0 mm: vmalloc: check if a hash-index is in cpu_possible_mask
* 0bc6d608b445 mm: fix incorrect vbq reference in purge_fragmented_block
* 450f8c5270df mm/vmalloc: fix vmalloc which may return null if called 
with __GFP_NOFAIL
* 2ea2bf4a18c3 mm: vmalloc: bail out early in find_vmap_area() if vmap 
is not init
* bde74a3e8a71 mm/vmalloc: fix return value of vb_alloc if size is 0
* 8c620d05b7c3 mm: vmalloc: refactor vmalloc_dump_obj() function
* b0c8281703b8 mm: vmalloc: improve description of vmap node layer
* ecc3f0bf5c5a mm: vmalloc: add a shrinker to drain vmap pools
* dd89a137f483 mm: vmalloc: set nr_nodes based on CPUs in a system
* 8e63c98d86f6 mm: vmalloc: support multiple nodes in vmallocinfo
* cc32683cef48 mm: vmalloc: support multiple nodes in vread_iter
* 54d5ce65633d mm: vmalloc: add a scan area of VA only once
* ee9c199fb859 mm: vmalloc: offload free_vmap_area_lock lock
* c2c272d78b5a mm: vmalloc: remove global purge_vmap_area_root rb-tree
* c9b39e3ffa86 mm/vmalloc: remove vmap_area_list
* 091d2493d15f mm: vmalloc: remove global vmap_area_root rb-tree
* 53f06cc34bac mm: vmalloc: move vmap_init_free_space() down in vmalloc.c
* bf24196d9ab9 mm: vmalloc: rename adjust_va_to_fit_type() function
* 6e9c94401e34 mm: vmalloc: add va_alloc() helper
* ae528eb14e9a mm: Introduce vmap_page_range() to map pages in PCI 
address space
* e1dbcfaa1854 mm: Introduce VM_SPARSE kind and vm_area_[un]map_pages().
* d3a24e7a01c4 mm: Enforce VM_IOREMAP flag and range in ioremap_page_range.
* fc9813220585 mm/vmalloc: fix the unchecked dereference warning in 
vread_iter()
* a52e0157837e ascend: export interfaces required by ascend drivers
* 9b1283f2bec2 mm/vmalloc: Extend vmalloc usage about hugepage
> 
> 
> <snip>
> commit 8be4d46e12af32342569840d958272dbb3be3f4c
> Author: Uladzislau Rezki (Sony) <urezki@...il.com>
> Date:   Wed Jan 24 19:09:20 2024 +0100
> 
>      mm: vmalloc: refactor vmalloc_dump_obj() function
> 
> commit 15e02a39fb6b43f37100563c6a246252d5d1e6da
> Author: Uladzislau Rezki (Sony) <urezki@...il.com>
> Date:   Wed Jan 24 19:09:19 2024 +0100
> 
>      mm: vmalloc: improve description of vmap node layer
> 
> commit 7679ba6b36dbb300b757b672d6a32a606499e14b
> Author: Uladzislau Rezki (Sony) <urezki@...il.com>
> Date:   Tue Jan 2 19:46:33 2024 +0100
> 
>      mm: vmalloc: add a shrinker to drain vmap pools
> 
> commit 8f33a2ff307248c3e55a7696f60b3658b28edb57
> Author: Uladzislau Rezki (Sony) <urezki@...il.com>
> Date:   Tue Jan 2 19:46:32 2024 +0100
> 
>      mm: vmalloc: set nr_nodes based on CPUs in a system
> 
> commit 8e1d743f2c2671aa54f6f91a2b33823f92512870
> Author: Uladzislau Rezki (Sony) <urezki@...il.com>
> Date:   Tue Jan 2 19:46:31 2024 +0100
> 
>      mm: vmalloc: support multiple nodes in vmallocinfo
> 
> commit 53becf32aec1c8049b854f0c31a11df5ed75df6f
> Author: Uladzislau Rezki (Sony) <urezki@...il.com>
> Date:   Tue Jan 2 19:46:30 2024 +0100
> 
>      mm: vmalloc: support multiple nodes in vread_iter
> 
> commit 96aa8437d169b8e030a98e2b74fd9a8ee9d3be7e
> Author: Uladzislau Rezki (Sony) <urezki@...il.com>
> Date:   Fri Feb 2 20:06:28 2024 +0100
> 
>      mm: vmalloc: add a scan area of VA only once
> 
> commit 72210662c5a2b6005f6daea7fe293a0dc573e1a5
> Author: Uladzislau Rezki (Sony) <urezki@...il.com>
> Date:   Tue Jan 2 19:46:29 2024 +0100
> 
>      mm: vmalloc: offload free_vmap_area_lock lock
> 
> commit 282631cb2447318e2a55b41a665dbe8571c46d70
> Author: Uladzislau Rezki (Sony) <urezki@...il.com>
> Date:   Tue Jan 2 19:46:28 2024 +0100
> 
>      mm: vmalloc: remove global purge_vmap_area_root rb-tree
> 
> commit 55c49fee57af99f3c663e69dedc5b85e691bbe50
> Author: Baoquan He <bhe@...hat.com>
> Date:   Tue Jan 2 19:46:27 2024 +0100
> 
>      mm/vmalloc: remove vmap_area_list
> 
> commit d093602919ad5908532142a048539800fa94a0d1
> Author: Uladzislau Rezki (Sony) <urezki@...il.com>
> Date:   Tue Jan 2 19:46:26 2024 +0100
> 
>      mm: vmalloc: remove global vmap_area_root rb-tree
> 
> commit 7fa8cee003166ef6db0bba70d610dbf173543811
> Author: Uladzislau Rezki (Sony) <urezki@...il.com>
> Date:   Tue Jan 2 19:46:25 2024 +0100
> 
>      mm: vmalloc: move vmap_init_free_space() down in vmalloc.c
> 
> 
> commit 5b75b8e1b9040b34f43809b1948eaa4e83e39112
> Author: Uladzislau Rezki (Sony) <urezki@...il.com>
> Date:   Tue Jan 2 19:46:24 2024 +0100
> 
>      mm: vmalloc: rename adjust_va_to_fit_type() function
> 
> 
> commit 38f6b9af04c4b79f81b3c2a0f76d1de94b78d7bc
> Author: Uladzislau Rezki (Sony) <urezki@...il.com>
> Date:   Tue Jan 2 19:46:23 2024 +0100
> 
>      mm: vmalloc: add va_alloc() helper
> <snip>
> 
> Thank you!
> 
> --
> Uladzislau Rezki


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ