[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260119065027.918085-1-zhiguo.zhou@intel.com>
Date: Mon, 19 Jan 2026 14:50:23 +0800
From: Zhiguo Zhou <zhiguo.zhou@...el.com>
To: linux-mm@...ck.org,
linux-fsdevel@...r.kernel.org
Cc: willy@...radead.org,
akpm@...ux-foundation.org,
david@...nel.org,
lorenzo.stoakes@...cle.com,
Liam.Howlett@...cle.com,
vbabka@...e.cz,
rppt@...nel.org,
surenb@...gle.com,
mhocko@...e.com,
muchun.song@...ux.dev,
osalvador@...e.de,
linux-kernel@...r.kernel.org,
tianyou.li@...el.com,
tim.c.chen@...ux.intel.com,
gang.deng@...el.com,
Zhiguo Zhou <zhiguo.zhou@...el.com>
Subject: [PATCH 0/2] mm/readahead: batch folio insertion to improve performance
This patch series improves readahead performance by batching folio
insertions into the page cache's xarray, reducing the cacheline transfers,
and optimizing the execution efficiency in the critical section.
PROBLEM
=======
When the `readahead` syscall is invoked, `page_cache_ra_unbounded`
currently inserts folios into the page cache individually. Each insertion
requires acquiring and releasing the `xa_lock`, which can lead to:
1. Significant lock contention when running on multi-core systems
2. Cross-core cacheline transfers for the lock and associated data
3. Increased execution time due to frequent lock operations
These overheads become particularly noticeable in high-throughput storage
workloads where readahead is frequently used.
SOLUTION
========
This series introduces batched folio insertion for contiguous ranges in
the page cache. The key changes are:
Patch 1/2: Refactor __filemap_add_folio to separate critical section
- Extract the core xarray insertion logic into
__filemap_add_folio_xa_locked()
- Allow callers to control locking granularity via a 'xa_locked' parameter
- Maintain existing functionality while preparing for batch insertion
Patch 2/2: Batch folio insertion in page_cache_ra_unbounded
- Introduce filemap_add_folio_range() for batch insertion of folios
- Pre-allocate folios before entering the critical section
- Insert multiple folios while holding the xa_lock only once
- Update page_cache_ra_unbounded to use the new batching interface
- Insert folios individually when memory is under pressure
PERFORMANCE RESULTS
===================
Testing was performed using RocksDB's `db_bench` (readseq workload) on a
32-vCPU Intel Ice Lake server with 256GB memory:
1. Throughput improved by 1.51x (ops/sec)
2. Latency:
- P50: 63.9% reduction (6.15 usec → 2.22 usec)
- P75: 42.1% reduction (13.38 usec → 7.75 usec)
- P99: 31.4% reduction (507.95 usec → 348.54 usec)
3. IPC of page_cache_ra_unbounded (excluding lock overhead) improved by
2.18x
TESTING DETAILS
===============
- Kernel: v6.19-rc5 (0f61b1, tip of mm.git:mm-stable on Jan 14, 2026)
- Hardware: Intel Ice Lake server, 32 vCPUs, 256GB RAM
- Workload: RocksDB db_bench readseq
- Command: ./db_bench --benchmarks=readseq,stats --use_existing_db=1
--num_multi_db=32 --threads=32 --num=1600000 --value_size=8192
--cache_size=16GB
IMPLEMENTATION NOTES
====================
- The existing single-folio insertion API remains unchanged for
compatibility
- Hugetlb folio handling is preserved through the refactoring
- Error injection (BPF) support is maintained for __filemap_add_folio
Zhiguo Zhou (2):
mm/filemap: refactor __filemap_add_folio to separate critical section
mm/readahead: batch folio insertion to improve performance
include/linux/pagemap.h | 4 +-
mm/filemap.c | 238 ++++++++++++++++++++++++++++------------
mm/hugetlb.c | 3 +-
mm/readahead.c | 196 ++++++++++++++++++++++++++-------
4 files changed, 325 insertions(+), 116 deletions(-)
--
2.43.0
Powered by blists - more mailing lists