linux-kernel - Re: [PATCH v3 04/11] mm: vmalloc: Remove global vmap_area

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZZmBh-g_evLcNHT1@pc636>
Date: Sat, 6 Jan 2024 17:36:23 +0100
From: Uladzislau Rezki <urezki@...il.com>
To: Wen Gu <guwen@...ux.alibaba.com>
Cc: Uladzislau Rezki <urezki@...il.com>,
	shaozhengchao <shaozhengchao@...wei.com>, linux-mm@...ck.org,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3 04/11] mm: vmalloc: Remove global vmap_area_root
 rb-tree

> 
> On 2024/1/5 18:50, Uladzislau Rezki wrote:
> 
> > Hello, Wen Gu.
> > 
> > > 
> > > Hi Uladzislau Rezki,
> > > 
> 
> <...>
> 
> > > Fortunately, thank you for this patch set, the global vmap_area_lock was
> > > removed and per node lock vn->busy.lock is introduced. it is really helpful:
> > > 
> > > In 48 CPUs qemu environment, the Requests/s increased by 5 times:
> > > - nginx
> > > - wrk -c 1000 -t 96 -d 30 http://127.0.0.1:80
> > > 
> > >                  vzalloced shmem      vzalloced shmem(with this patch set)
> > > Requests/sec          113536.56            583729.93
> > > 
> > > 
> > Thank you for the confirmation that your workload is improved. The "nginx"
> > is 5 times better!
> > 
> 
> Yes, thank you very much for the improvement!
> 
> > > But it also has some overhead, compared to using kzalloced shared memory
> > > or unsetting CONFIG_HARDENED_USERCOPY, which won't involve finding vmap area:
> > > 
> > >                  kzalloced shmem      vzalloced shmem(unset CONFIG_HARDENED_USERCOPY)
> > > Requests/sec          831950.39            805164.78
> > > 
> > > 
> > The CONFIG_HARDENED_USERCOPY prevents coping "wrong" memory regions. That is
> > why if it is a vmalloced memory it wants to make sure it is really true,
> > if not user-copy is aborted.
> > 
> > So there is an extra work that involves finding a VA associated with an address.
> > 
> 
> Yes, and lock contention in finding VA is likely to be a performance bottleneck,
> which is mitigated a lot by your work.
> 
> > > So, as a newbie in Linux-mm, I would like to ask for some suggestions:
> > > 
> > > Is it possible to further eliminate the overhead caused by lock contention
> > > in find_vmap_area() in this scenario (maybe this is asking too much), or the
> > > only way out is not setting CONFIG_HARDENED_USERCOPY or not using vzalloced
> > > buffer in the situation where cocurrent kernel-userspace-copy happens?
> > > 
> > Could you please try below patch, if it improves this series further?
> > Just in case:
> > 
> 
> Thank you! I tried the patch, and it seems that the wait for rwlock_t
> also exists, as much as using spinlock_t. (The flamegraph is attached.
> Not sure why the read_lock waits so long, given that there is no frequent
> write_lock competition)
> 
>                vzalloced shmem(spinlock_t)   vzalloced shmem(rwlock_t)
> Requests/sec         583729.93                     460007.44
> 
> So I guess the overhead in finding vmap area is inevitable here and the
> original spin_lock is fine in this series.
> 
I have also noticed a erformance difference between rwlock and spinlock. 
So, yes. This is what we need to do extra if CONFIG_HARDENED_USERCOPY is
set, i.e. find a VA.

--
Uladzislau Rezki