[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <df09f0f4-fd23-469e-94d7-864b3bdb17c6@arm.com>
Date: Tue, 6 May 2025 10:53:07 +0100
From: Ryan Roberts <ryan.roberts@....com>
To: Jan Kara <jack@...e.cz>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
"Matthew Wilcox (Oracle)" <willy@...radead.org>,
Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>, David Hildenbrand
<david@...hat.com>, Dave Chinner <david@...morbit.com>,
Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>,
Kalesh Singh <kaleshsingh@...gle.com>, Zi Yan <ziy@...dia.com>,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [RFC PATCH v4 4/5] mm/readahead: Store folio order in struct
file_ra_state
On 05/05/2025 10:52, Jan Kara wrote:
> On Wed 30-04-25 15:59:17, Ryan Roberts wrote:
>> Previously the folio order of the previous readahead request was
>> inferred from the folio who's readahead marker was hit. But due to the
>> way we have to round to non-natural boundaries sometimes, this first
>> folio in the readahead block is often smaller than the preferred order
>> for that request. This means that for cases where the initial sync
>> readahead is poorly aligned, the folio order will ramp up much more
>> slowly.
>>
>> So instead, let's store the order in struct file_ra_state so we are not
>> affected by any required alignment. We previously made enough room in
>> the struct for a 16 order field. This should be plenty big enough since
>> we are limited to MAX_PAGECACHE_ORDER anyway, which is certainly never
>> larger than ~20.
>>
>> Since we now pass order in struct file_ra_state, page_cache_ra_order()
>> no longer needs it's new_order parameter, so let's remove that.
>>
>> Worked example:
>>
>> Here we are touching pages 17-256 sequentially just as we did in the
>> previous commit, but now that we are remembering the preferred order
>> explicitly, we no longer have the slow ramp up problem. Note
>> specifically that we no longer have 2 rounds (2x ~128K) of order-2
>> folios:
>>
>> TYPE STARTOFFS ENDOFFS SIZE STARTPG ENDPG NRPG ORDER RA
>> ----- ---------- ---------- ---------- ------- ------- ----- ----- --
>> HOLE 0x00000000 0x00001000 4096 0 1 1
>> FOLIO 0x00001000 0x00002000 4096 1 2 1 0
>> FOLIO 0x00002000 0x00003000 4096 2 3 1 0
>> FOLIO 0x00003000 0x00004000 4096 3 4 1 0
>> FOLIO 0x00004000 0x00005000 4096 4 5 1 0
>> FOLIO 0x00005000 0x00006000 4096 5 6 1 0
>> FOLIO 0x00006000 0x00007000 4096 6 7 1 0
>> FOLIO 0x00007000 0x00008000 4096 7 8 1 0
>> FOLIO 0x00008000 0x00009000 4096 8 9 1 0
>> FOLIO 0x00009000 0x0000a000 4096 9 10 1 0
>> FOLIO 0x0000a000 0x0000b000 4096 10 11 1 0
>> FOLIO 0x0000b000 0x0000c000 4096 11 12 1 0
>> FOLIO 0x0000c000 0x0000d000 4096 12 13 1 0
>> FOLIO 0x0000d000 0x0000e000 4096 13 14 1 0
>> FOLIO 0x0000e000 0x0000f000 4096 14 15 1 0
>> FOLIO 0x0000f000 0x00010000 4096 15 16 1 0
>> FOLIO 0x00010000 0x00011000 4096 16 17 1 0
>> FOLIO 0x00011000 0x00012000 4096 17 18 1 0
>> FOLIO 0x00012000 0x00013000 4096 18 19 1 0
>> FOLIO 0x00013000 0x00014000 4096 19 20 1 0
>> FOLIO 0x00014000 0x00015000 4096 20 21 1 0
>> FOLIO 0x00015000 0x00016000 4096 21 22 1 0
>> FOLIO 0x00016000 0x00017000 4096 22 23 1 0
>> FOLIO 0x00017000 0x00018000 4096 23 24 1 0
>> FOLIO 0x00018000 0x00019000 4096 24 25 1 0
>> FOLIO 0x00019000 0x0001a000 4096 25 26 1 0
>> FOLIO 0x0001a000 0x0001b000 4096 26 27 1 0
>> FOLIO 0x0001b000 0x0001c000 4096 27 28 1 0
>> FOLIO 0x0001c000 0x0001d000 4096 28 29 1 0
>> FOLIO 0x0001d000 0x0001e000 4096 29 30 1 0
>> FOLIO 0x0001e000 0x0001f000 4096 30 31 1 0
>> FOLIO 0x0001f000 0x00020000 4096 31 32 1 0
>> FOLIO 0x00020000 0x00021000 4096 32 33 1 0
>> FOLIO 0x00021000 0x00022000 4096 33 34 1 0
>> FOLIO 0x00022000 0x00024000 8192 34 36 2 1
>> FOLIO 0x00024000 0x00028000 16384 36 40 4 2
>> FOLIO 0x00028000 0x0002c000 16384 40 44 4 2
>> FOLIO 0x0002c000 0x00030000 16384 44 48 4 2
>> FOLIO 0x00030000 0x00034000 16384 48 52 4 2
>> FOLIO 0x00034000 0x00038000 16384 52 56 4 2
>> FOLIO 0x00038000 0x0003c000 16384 56 60 4 2
>> FOLIO 0x0003c000 0x00040000 16384 60 64 4 2
>> FOLIO 0x00040000 0x00050000 65536 64 80 16 4
>> FOLIO 0x00050000 0x00060000 65536 80 96 16 4
>> FOLIO 0x00060000 0x00080000 131072 96 128 32 5
>> FOLIO 0x00080000 0x000a0000 131072 128 160 32 5
>> FOLIO 0x000a0000 0x000c0000 131072 160 192 32 5
>> FOLIO 0x000c0000 0x000e0000 131072 192 224 32 5
>> FOLIO 0x000e0000 0x00100000 131072 224 256 32 5
>> FOLIO 0x00100000 0x00120000 131072 256 288 32 5
>> FOLIO 0x00120000 0x00140000 131072 288 320 32 5 Y
>> HOLE 0x00140000 0x00800000 7077888 320 2048 1728
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@....com>
>
> ...
>
>> @@ -469,6 +469,7 @@ void page_cache_ra_order(struct readahead_control *ractl,
>> int err = 0;
>> gfp_t gfp = readahead_gfp_mask(mapping);
>> unsigned int min_ra_size = max(4, mapping_min_folio_nrpages(mapping));
>> + unsigned int new_order = ra->order;
>>
>> /*
>> * Fallback when size < min_nrpages as each folio should be
>> @@ -483,6 +484,8 @@ void page_cache_ra_order(struct readahead_control *ractl,
>> new_order = min_t(unsigned int, new_order, ilog2(ra->size));
>> new_order = max(new_order, min_order);
>>
>> + ra->order = new_order;
>> +
>> /* See comment in page_cache_ra_unbounded() */
>> nofs = memalloc_nofs_save();
>> filemap_invalidate_lock_shared(mapping);
>> @@ -525,6 +528,7 @@ void page_cache_ra_order(struct readahead_control *ractl,
>> * ->readahead() may have updated readahead window size so we have to
>> * check there's still something to read.
>> */
>> + ra->order = 0;
>
> Hum, so you reset desired folio order if readahead hit some pre-existing
> pages in the page cache. Is this really desirable? Why not leave the
> desired order as it was for the next request?
My aim was to not let order grow unbounded. When the filesystem doesn't support
large folios we end up here (from the "goto fallback") and without this, order
will just grow and grow (perhaps it doesn't matter though). I think we should
keep this.
But I guess your point is that we can also end up here when the filesystem does
support large folios but there is an error. In thta case, yes, I'll change to
not reset order to 0; it has already been fixed up earlier in this path.
How's this:
---8<---
diff --git a/mm/readahead.c b/mm/readahead.c
index 18972bc34861..0054ca18a815 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -475,8 +475,10 @@ void page_cache_ra_order(struct readahead_control *ractl,
* Fallback when size < min_nrpages as each folio should be
* at least min_nrpages anyway.
*/
- if (!mapping_large_folio_support(mapping) || ra->size < min_ra_size)
+ if (!mapping_large_folio_support(mapping) || ra->size < min_ra_size) {
+ ra->order = 0;
goto fallback;
+ }
limit = min(limit, index + ra->size - 1);
@@ -528,7 +530,6 @@ void page_cache_ra_order(struct readahead_control *ractl,
* ->readahead() may have updated readahead window size so we have to
* check there's still something to read.
*/
- ra->order = 0;
if (ra->size > index - start)
do_page_cache_ra(ractl, ra->size - (index - start),
ra->async_size);
---8<---
Thanks,
Ryan
>
>> if (ra->size > index - start)
>> do_page_cache_ra(ractl, ra->size - (index - start),
>> ra->async_size);
>
> Honza
Powered by blists - more mailing lists