linux-kernel - Re: [linus:master] [readahead] ab4443fe3c: vm-scalability.throughput -21.4% regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <dee823ca-7100-4289-8670-95047463c09d@intel.com>
Date: Mon, 4 Mar 2024 13:35:10 +0800
From: "Yin, Fengwei" <fengwei.yin@...el.com>
To: Yujie Liu <yujie.liu@...el.com>, Jan Kara <jack@...e.cz>
CC: Oliver Sang <oliver.sang@...el.com>, <oe-lkp@...ts.linux.dev>,
	<lkp@...el.com>, <linux-kernel@...r.kernel.org>, Andrew Morton
	<akpm@...ux-foundation.org>, Matthew Wilcox <willy@...radead.org>, Guo Xuenan
	<guoxuenan@...wei.com>, <linux-fsdevel@...r.kernel.org>,
	<ying.huang@...el.com>, <feng.tang@...el.com>
Subject: Re: [linus:master] [readahead] ab4443fe3c: vm-scalability.throughput
 -21.4% regression

Hi Jan,

On 3/4/2024 12:59 PM, Yujie Liu wrote:
>  From the perf profile, we can see that the contention of folio lru lock
> becomes more intense. We also did a simple one-file "dd" test. Looks
> like it is more likely that low-order folios are allocated after commit
> ab4443fe3c (Fengwei will help provide the data soon). Therefore, the
> average folio size decreases while the total folio amount increases,
> which leads to touching lru lock more often.

I did following testing:
   With a xfs image in tmpfs + mount it to /mnt and create 12G test file
   (sparse-file), use one process to read it on a Ice Lake machine with
   256G system memory. So we could make sure we are doing a sequential
   file read with no page reclaim triggered.

   At the same time, profiling the distribution of order parameter of
   filemap_alloc_folio() call to understand how the large folio order
   for page cache is generated.

Here is what we got:

- Commit f0b7a0d1d46625db:
$ dd bs=4k if=/mnt/sparse-file of=/dev/null
3145728+0 records in
3145728+0 records out
12884901888 bytes (13 GB, 12 GiB) copied, 2.52208 s, 5.01 GB/s

filemap_alloc_folio
      page order    : count     distribution
         0          : 57       |                                        |
         1          : 0        |                                        |
         2          : 20       |                                        |
         3          : 2        |                                        |
         4          : 4        |                                        |
         5          : 98300    |****************************************|

- Commit ab4443fe3ca6:
$ dd bs=4k if=/mnt/sparse-file of=/dev/null
3145728+0 records in
3145728+0 records out
12884901888 bytes (13 GB, 12 GiB) copied, 2.51469 s, 5.1 GB/s

filemap_alloc_folio
      page order    : count     distribution
         0          : 21       |                                        |
         1          : 0        |                                        |
         2          : 196615   |****************************************|
         3          : 98303    |*******************                     |
         4          : 98303    |*******************                     |


Even the file read throughput is almost same. But the distribution of
order looks like a regression with ab4443fe3ca6 (more smaller order
page cache is generated than parent commit). Thanks.


Regards
Yin, Fengwei