linux-kernel - Re: [PATCH v2 2/4] mm: vmalloc: use rwsem, mutex for vmap_area_lock and vmap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <fadd8558-8917-4012-b5ea-c6376c835cc8@lucifer.local>
Date:   Sun, 19 Mar 2023 20:29:16 +0000
From:   Lorenzo Stoakes <lstoakes@...il.com>
To:     Andrew Morton <akpm@...ux-foundation.org>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, Baoquan He <bhe@...hat.com>,
        Uladzislau Rezki <urezki@...il.com>,
        Matthew Wilcox <willy@...radead.org>,
        David Hildenbrand <david@...hat.com>,
        Liu Shixin <liushixin2@...wei.com>,
        Jiri Olsa <jolsa@...nel.org>
Subject: Re: [PATCH v2 2/4] mm: vmalloc: use rwsem, mutex for vmap_area_lock
 and vmap_block->lock

On Sun, Mar 19, 2023 at 01:10:47PM -0700, Andrew Morton wrote:
> On Sun, 19 Mar 2023 07:09:31 +0000 Lorenzo Stoakes <lstoakes@...il.com> wrote:
>
> > vmalloc() is, by design, not permitted to be used in atomic context and
> > already contains components which may sleep, so avoiding spin locks is not
> > a problem from the perspective of atomic context.
> >
> > The global vmap_area_lock is held when the red/black tree rooted in
> > vmap_are_root is accessed and thus is rather long-held and under
> > potentially high contention. It is likely to be under contention for reads
> > rather than write, so replace it with a rwsem.
> >
> > Each individual vmap_block->lock is likely to be held for less time but
> > under low contention, so a mutex is not an outrageous choice here.
> >
> > A subset of test_vmalloc.sh performance results:-
> >
> > fix_size_alloc_test             0.40%
> > full_fit_alloc_test		2.08%
> > long_busy_list_alloc_test	0.34%
> > random_size_alloc_test		-0.25%
> > random_size_align_alloc_test	0.06%
> > ...
> > all tests cycles                0.2%
> >
> > This represents a tiny reduction in performance that sits barely above
> > noise.
> >
> > The reason for making this change is to build a basis for vread() to be
> > usable asynchronously, this eliminating the need for a bounce buffer when
> > copying data to userland in read_kcore() and allowing that to be converted
> > to an iterator form.
> >
>
> I'm not understanding the final paragraph.  How and where is vread()
> used "asynchronously"?


The basis for saying asynchronous was based on Documentation/filesystems/vfs.rst
describing read_iter() as 'possibly asynchronous read with iov_iter as
destination', and read_iter() is what is (now) invoked when accessing
/proc/kcore.

However I agree this is vague and it is clearer to refer to the fact that we are
now directly writing to user memory and thus wish to avoid spinlocks as we may
need to fault in user memory in doing so.

Would it be ok for you to go ahead and replace that final paragraph with the
below?:-

The reason for making this change is to build a basis for vread() to write
to user memory directly via an iterator; as a result we may cause page
faults during which we must not hold a spinlock. Doing this eliminates the
need for a bounce buffer in read_kcore() and thus permits that to be
converted to also use an iterator, as a read_iter() handler.