linux-kernel - Re: [linus:master] [readahead] ab4443fe3c: vm-scalability.throughput -21.4% regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <561465df-1370-4519-abe3-3998bd78233f@intel.com>
Date: Sun, 10 Mar 2024 14:40:00 +0800
From: "Yin, Fengwei" <fengwei.yin@...el.com>
To: Jan Kara <jack@...e.cz>
CC: Yujie Liu <yujie.liu@...el.com>, Oliver Sang <oliver.sang@...el.com>,
	<oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>, Matthew Wilcox
	<willy@...radead.org>, Guo Xuenan <guoxuenan@...wei.com>,
	<linux-fsdevel@...r.kernel.org>, <ying.huang@...el.com>,
	<feng.tang@...el.com>
Subject: Re: [linus:master] [readahead] ab4443fe3c: vm-scalability.throughput
 -21.4% regression

On 3/7/2024 5:23 PM, Jan Kara wrote:
> Thanks for testing! This is an interesting result and certainly unexpected
> for me. The readahead code allocates naturally aligned pages so based on
> the distribution of allocations it seems that before commit ab4443fe3ca6
> readahead window was at least 32 pages (128KB) aligned and so we allocated
> order 5 pages. After the commit, the readahead window somehow ended up only
> aligned to 20 modulo 32. To follow natural alignment and fill 128KB
> readahead window we allocated order 2 page (got us to offset 24 modulo 32),
> then order 3 page (got us to offset 0 modulo 32), order 4 page (larger
> would not fit in 128KB readahead window now), and order 2 page to finish
> filling the readahead window.
> 
> Now I'm not 100% sure why the readahead window alignment changed with
> different rounding when placing readahead mark - probably that's some
> artifact when readahead window is tiny in the beginning before we scale it
> up (I'll verify by tracing whether everything ends up looking correctly
> with the current code). So I don't expect this is a problem in ab4443fe3ca6
> as such but it exposes the issue that readahead page insertion code should
> perhaps strive to achieve better readahead window alignment with logical
> file offset even at the cost of occasionally performing somewhat shorter
> readahead. I'll look into this once I dig out of the huge heap of email
> after vacation...
Hi Jan,
I am also curious to this behavior and add tried add logs to understand
the behavior here. Here is something difference w/o ab4443fe3ca6:
  - with ab4443fe3ca6:
  You are right about the folio order as the readahead window is 0x20.
  The folio order sequence is like order 2, order 4, order3, order2.

  But different thing is always mark the first order 2 folio readahead.
  So the max order is boosted to 4 in page_cache_ra_order(). The code
  path always hit
     if (index == expected || index == (ra->start + ra->size))
  in ondemand_readahead().

  If just change the round_down() to round_up() in ra_alloc_folio(),
  the major folio order will be restored to 5.

  - without ab4443fe3ca6:
  at the beginning, the folio order sequence is same like 2, 4, 3, 2.
  But besides the first order2 folio, order4 folio will be marked as
  readahead also. So it's possible the order boosted to 5.
  Also, not just path
     if (index == expected || index == (ra->start + ra->size))
  is hit. but also
      if (folio) {
  can be hit (I didn't check other path as this testing is sequential
  read).

  There are some back and forth between 5 and 2,4,3,2, the order is
  stabilized on 5.

  I didn't fully understand the whole thing and will dig deeper. The
  above is just what the log showed.


Hi Matthew,
I noticed one thing when readahead folio order is being pushed forward,
there are several times readahead trying to allocate and add folios to
page cache. But failed as there is folio inserted to page cache cover
the requested index already. Once the folio order is correct, there is
no such case anymore. I suppose this is expected.


Regards
Yin, Fengwei