[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOR27cSr9yxodkctfp-Yjybh1NsKBeSkhdbZYeK7O5M87PfEYw@mail.gmail.com>
Date: Thu, 20 Feb 2025 13:48:18 +0100
From: David Frank <david@...idfrank.ch>
To: linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Efficient mapping of sparse file holes to zero-pages
Hi all,
I'd like to efficiently mmap a large sparse file (ext4), 95% of which
is holes. I was unsatisfied with the performance and after profiling,
I found that most of the time is spent in filemap_add_folio and
filemap_alloc_folio - much more than in my algorithm:
- 97.87% filemap_fault
- 97.57% do_sync_mmap_readahead
- page_cache_ra_order
- 97.28% page_cache_ra_unbounded
- 40.80% filemap_add_folio
+ 21.93% __filemap_add_folio
+ 8.88% folio_add_lru
+ 7.56% workingset_refault
+ 28.73% filemap_alloc_folio
+ 22.34% read_pages
+ 3.29% xa_load
As a workaround, I started using lseek and SEEK_HOLE+SEEK_DATA and
changed the algorithm to use a static array filled with zeros instead
of reading from the holes. This works ~30x faster, however, it
introduces substantial complexity in the implementation. I was
wondering if mapping holes to zero pages with COW in the kernel is
being considered.
I found [a related thread][1] from early 2022 which mentions mapping
to zero pages for shared memory objects. There seemed to be some
concerns about the complexity, I wonder if it's different for (even
just private/readonly) mmap.
[1]: https://lore.kernel.org/lkml/4b1885b8-eb95-c50-2965-11e7c8efbf36@google.com/T/
Thanks,
David
Powered by blists - more mailing lists