lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 6 Apr 2023 11:22:38 +0100
From:   Mel Gorman <mgorman@...e.de>
To:     Qi Zheng <zhengqi.arch@...edance.com>
Cc:     akpm@...ux-foundation.org, willy@...radead.org, lstoakes@...il.com,
        vbabka@...e.cz, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/2] mm: swap: fix performance regression on
 sparsetruncate-tiny

On Thu, Apr 06, 2023 at 12:18:53AM +0800, Qi Zheng wrote:
> The ->percpu_pvec_drained was originally introduced by
> commit d9ed0d08b6c6 ("mm: only drain per-cpu pagevecs once per
> pagevec usage") to drain per-cpu pagevecs only once per pagevec
> usage. But after converting the swap code to be more folio-based,
> the commit c2bc16817aa0 ("mm/swap: add folio_batch_move_lru()")
> breaks this logic, which would cause ->percpu_pvec_drained to be
> reset to false, that means per-cpu pagevecs will be drained
> multiple times per pagevec usage.
> 
> In theory, there should be no functional changes when converting
> code to be more folio-based. We should call folio_batch_reinit()
> in folio_batch_move_lru() instead of folio_batch_init(). And to
> verify that we still need ->percpu_pvec_drained, I ran
> mmtests/sparsetruncate-tiny and got the following data:
> 
>                              baseline                   with
>                             baseline/                 patch/
> Min       Time      326.00 (   0.00%)      328.00 (  -0.61%)
> 1st-qrtle Time      334.00 (   0.00%)      336.00 (  -0.60%)
> 2nd-qrtle Time      338.00 (   0.00%)      341.00 (  -0.89%)
> 3rd-qrtle Time      343.00 (   0.00%)      347.00 (  -1.17%)
> Max-1     Time      326.00 (   0.00%)      328.00 (  -0.61%)
> Max-5     Time      327.00 (   0.00%)      330.00 (  -0.92%)
> Max-10    Time      328.00 (   0.00%)      331.00 (  -0.91%)
> Max-90    Time      350.00 (   0.00%)      357.00 (  -2.00%)
> Max-95    Time      395.00 (   0.00%)      390.00 (   1.27%)
> Max-99    Time      508.00 (   0.00%)      434.00 (  14.57%)
> Max       Time      547.00 (   0.00%)      476.00 (  12.98%)
> Amean     Time      344.61 (   0.00%)      345.56 *  -0.28%*
> Stddev    Time       30.34 (   0.00%)       19.51 (  35.69%)
> CoeffVar  Time        8.81 (   0.00%)        5.65 (  35.87%)
> BAmean-99 Time      342.38 (   0.00%)      344.27 (  -0.55%)
> BAmean-95 Time      338.58 (   0.00%)      341.87 (  -0.97%)
> BAmean-90 Time      336.89 (   0.00%)      340.26 (  -1.00%)
> BAmean-75 Time      335.18 (   0.00%)      338.40 (  -0.96%)
> BAmean-50 Time      332.54 (   0.00%)      335.42 (  -0.87%)
> BAmean-25 Time      329.30 (   0.00%)      332.00 (  -0.82%)
> 
> From the above it can be seen that we get similar data to when
> ->percpu_pvec_drained was introduced, so we still need it. Let's
> call folio_batch_reinit() in folio_batch_move_lru() to restore
> the original logic.
> 
> Fixes: c2bc16817aa0 ("mm/swap: add folio_batch_move_lru()")
> Signed-off-by: Qi Zheng <zhengqi.arch@...edance.com>

Well spotted,

Acked-by: Mel Gorman <mgorman@...e.de>

-- 
Mel Gorman
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ