lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 22 Mar 2021 16:44:53 +0000
From:   Mel Gorman <mgorman@...hsingularity.net>
To:     Jesper Dangaard Brouer <brouer@...hat.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Chuck Lever <chuck.lever@...cle.com>,
        Christoph Hellwig <hch@...radead.org>,
        Alexander Duyck <alexander.duyck@...il.com>,
        Matthew Wilcox <willy@...radead.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux-Net <netdev@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>,
        Linux-NFS <linux-nfs@...r.kernel.org>
Subject: Re: [PATCH 0/3 v5] Introduce a bulk order-0 page allocator

On Mon, Mar 22, 2021 at 01:04:46PM +0100, Jesper Dangaard Brouer wrote:
> On Mon, 22 Mar 2021 09:18:42 +0000
> Mel Gorman <mgorman@...hsingularity.net> wrote:
> 
> > This series is based on top of Matthew Wilcox's series "Rationalise
> > __alloc_pages wrapper" and does not apply to 5.12-rc2. If you want to
> > test and are not using Andrew's tree as a baseline, I suggest using the
> > following git tree
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git mm-bulk-rebase-v5r9
> 
> page_bench04_bulk[1] micro-benchmark on branch: mm-bulk-rebase-v5r9
>  [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/bench/page_bench04_bulk.c
> 
> BASELINE
>  single_page alloc+put: Per elem: 199 cycles(tsc) 55.472 ns
> 
> LIST variant: time_bulk_page_alloc_free_list: step=bulk size
> 
>  Per elem: 206 cycles(tsc) 57.478 ns (step:1)
>  Per elem: 154 cycles(tsc) 42.861 ns (step:2)
>  Per elem: 145 cycles(tsc) 40.536 ns (step:3)
>  Per elem: 142 cycles(tsc) 39.477 ns (step:4)
>  Per elem: 142 cycles(tsc) 39.610 ns (step:8)
>  Per elem: 137 cycles(tsc) 38.155 ns (step:16)
>  Per elem: 135 cycles(tsc) 37.739 ns (step:32)
>  Per elem: 134 cycles(tsc) 37.282 ns (step:64)
>  Per elem: 133 cycles(tsc) 36.993 ns (step:128)
> 
> ARRAY variant: time_bulk_page_alloc_free_array: step=bulk size
> 
>  Per elem: 202 cycles(tsc) 56.383 ns (step:1)
>  Per elem: 144 cycles(tsc) 40.047 ns (step:2)
>  Per elem: 134 cycles(tsc) 37.339 ns (step:3)
>  Per elem: 128 cycles(tsc) 35.578 ns (step:4)
>  Per elem: 120 cycles(tsc) 33.592 ns (step:8)
>  Per elem: 116 cycles(tsc) 32.362 ns (step:16)
>  Per elem: 113 cycles(tsc) 31.476 ns (step:32)
>  Per elem: 110 cycles(tsc) 30.633 ns (step:64)
>  Per elem: 110 cycles(tsc) 30.596 ns (step:128)
> 
> Compared to the previous results (see below) list-variant got faster,
> but array-variant is still faster.  The array variant lost a little
> performance.  I think this can be related to the stats counters got
> added/moved inside the loop, in this patchset.
> 

If you are feeling particularly brave, take a look at
git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git mm-percpu-local_lock-v1r10

It's a prototype series rebased on top of the bulk allocator and this
version has not even been boot tested.  While it'll get rough treatment
during review, it should reduce the cost of the stat updates in the
bulk allocator as a side-effect.

-- 
Mel Gorman
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ