[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <7ftwasufn2w3bgesfbp66vlchhpiuctxkhdxp24y5nzzgz2oip@pi4kdyqkl5ss>
Date: Fri, 15 Aug 2025 14:51:49 +0900
From: Sergey Senozhatsky <senozhatsky@...omium.org>
To: Andrew Morton <akpm@...ux-foundation.org>,
David Hildenbrand <david@...hat.com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
"Liam R. Howlett" <Liam.Howlett@...cle.com>, Vlastimil Babka <vbabka@...e.cz>, Michal Hocko <mhocko@...e.com>
Cc: Suren Baghdasaryan <surenb@...gle.com>,
Suleiman Souhlal <suleiman@...gle.com>, linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Sergey Senozhatsky <senozhatsky@...omium.org>
Subject: mm: swapin read-ahead and zram
Hello,
We are seeing an unexpected behavior under standard memory pressure
test with zram being configured as a swap device (I tested on several
LTS kernels: 6.12, 6.6, 5.4). Namely, we observe multiple, repetitive
reads of (compressed) zram entries, sometimes in a very short time span:
...
[ 1523.345784] zram: decompress entry idx:1615265 zsmalloc handle:ffffa28c8be3ee70 obj_size:986 num_reads:188
[ 1523.365401] zram: decompress entry idx:1615265 zsmalloc handle:ffffa28c8be3ee70 obj_size:986 num_reads:189
[ 1523.385934] zram: decompress entry idx:1307291 zsmalloc handle:ffffa28c70100b50 obj_size:788 num_reads:227
[ 1523.405098] zram: decompress entry idx:150916 zsmalloc handle:ffffa28c70114fc0 obj_size:436 num_reads:230
[ 1523.475162] zram: decompress entry idx:266372 zsmalloc handle:ffffa28c4566e5e0 obj_size:437 num_reads:192
[ 1523.476785] zram: decompress entry idx:1615262 zsmalloc handle:ffffa28c8be3efe0 obj_size:518 num_reads:99
[ 1523.476899] zram: decompress entry idx:1294524 zsmalloc handle:ffffa28c475825d0 obj_size:436 num_reads:97
[ 1523.477323] zram: decompress entry idx:266373 zsmalloc handle:ffffa28c4566e828 obj_size:434 num_reads:111
[ 1523.478081] zram: decompress entry idx:1638538 zsmalloc handle:ffffa28c70100c40 obj_size:930 num_reads:40
[ 1523.478631] zram: decompress entry idx:1307301 zsmalloc handle:ffffa28c70100348 obj_size:0 num_reads:87
[ 1523.507349] zram: decompress entry idx:1307293 zsmalloc handle:ffffa28c701007c8 obj_size:989 num_reads:98
[ 1523.540930] zram: decompress entry idx:1294528 zsmalloc handle:ffffa28c47582e60 obj_size:441 num_reads:386
[ 1523.540930] zram: decompress entry idx:266372 zsmalloc handle:ffffa28c4566e5e0 obj_size:437 num_reads:193
[ 1523.540958] zram: decompress entry idx:1294534 zsmalloc handle:ffffa28c47582b30 obj_size:520 num_reads:176
[ 1523.540998] zram: decompress entry idx:1615262 zsmalloc handle:ffffa28c8be3efe0 obj_size:518 num_reads:100
[ 1523.541063] zram: decompress entry idx:1615259 zsmalloc handle:ffffa28c8be3e970 obj_size:428 num_reads:171
[ 1523.541101] zram: decompress entry idx:1294524 zsmalloc handle:ffffa28c475825d0 obj_size:436 num_reads:98
[ 1523.541212] zram: decompress entry idx:150916 zsmalloc handle:ffffa28c70114fc0 obj_size:436 num_reads:231
[ 1523.541379] zram: decompress entry idx:1638538 zsmalloc handle:ffffa28c70100c40 obj_size:930 num_reads:41
[ 1523.541412] zram: decompress entry idx:1294521 zsmalloc handle:ffffa28c47582548 obj_size:936 num_reads:70
[ 1523.541771] zram: decompress entry idx:1592754 zsmalloc handle:ffffa28c43a94738 obj_size:0 num_reads:72
[ 1523.541840] zram: decompress entry idx:1615265 zsmalloc handle:ffffa28c8be3ee70 obj_size:986 num_reads:190
[ 1523.547630] zram: decompress entry idx:1307298 zsmalloc handle:ffffa28c70100940 obj_size:797 num_reads:112
[ 1523.547771] zram: decompress entry idx:1307291 zsmalloc handle:ffffa28c70100b50 obj_size:788 num_reads:228
[ 1523.550138] zram: decompress entry idx:1307296 zsmalloc handle:ffffa28c70100f20 obj_size:682 num_reads:61
[ 1523.555016] zram: decompress entry idx:266385 zsmalloc handle:ffffa28c4566e7c0 obj_size:679 num_reads:103
[ 1523.566361] zram: decompress entry idx:1294524 zsmalloc handle:ffffa28c475825d0 obj_size:436 num_reads:99
[ 1523.566428] zram: decompress entry idx:1294528 zsmalloc handle:ffffa28c47582e60 obj_size:441 num_reads:387
...
For instance, notice how entry 1615265 is read, decompressed, then
presumably evicted from the memory, and read/decompressed again
soon after, almost immediately. Also notice how that entry 1615265
has already went through this cycle 189 times. It's not entirely
clear why this happens.
As far as I can tell, it seems that these extra zram reads are coming from
the swapin read-ahead:
handle_mm_fault
do_swap_page
swapin_readahead
swap_read_folio
submit_bio_wait
submit_bio_noacct_nocheck
__submit_bio
zram_submit_bio
zram_read_page
zram_read_from_zspool
There are several issues with this.
First, on systems with zram powered swap devices, these extra reads result
in extra decompressions, which translates into excessive CPU (S/W compression)
and battery usage. Along with the fact that each decompression first requires
a zsmalloc map() call, which may result in memcpy() (if compressed object
spans two physical pages).
Second, the read-ahead pages are likely to increase memory pressure, as
each read-ahead object decompresses into a PAGE_SIZE object, while we
also hold the compressed object in zsmalloc pool (until slot-free
notification).
Setting `sysctl -w vm.page-cluster=0` doesn't seem to help, because
page-cluster 0 limits the number of pages read-ahead to 1, so we still
read-ahead.
Can swapin read-ahead be entirely disabled for zram swap devices?
Powered by blists - more mailing lists