lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z7cygtpjGDJadgg0@casper.infradead.org>
Date: Thu, 20 Feb 2025 13:47:46 +0000
From: Matthew Wilcox <willy@...radead.org>
To: David Frank <david@...idfrank.ch>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: Efficient mapping of sparse file holes to zero-pages

On Thu, Feb 20, 2025 at 01:48:18PM +0100, David Frank wrote:
> I'd like to efficiently mmap a large sparse file (ext4), 95% of which
> is holes. I was unsatisfied with the performance and after profiling,
> I found that most of the time is spent in filemap_add_folio and
> filemap_alloc_folio - much more than in my algorithm:
> 
>  - 97.87% filemap_fault
>     - 97.57% do_sync_mmap_readahead
>        - page_cache_ra_order
>           - 97.28% page_cache_ra_unbounded
>              - 40.80% filemap_add_folio
>                 + 21.93% __filemap_add_folio
>                 + 8.88% folio_add_lru
>                 + 7.56% workingset_refault
>              + 28.73% filemap_alloc_folio
>              + 22.34% read_pages
>              + 3.29% xa_load

Yes, this is expected.

The fundamental problem is that we don't have the sparseness information
at the right point.  So the read request (or pagefault) comes in, the
VFS allocates a page, puts it in the pagecache, then asks the filesystem
to fill it.  The filesystem knows, so could theoretically tell the VFS
"Oh, this is a hole", but by this point the "damage" is done -- the page
has been allocated and added to the page cache.

Of course, this is a soluble problem.  The VFS could ask the filesystem
for its sparseness information (as you do in userspace), but unlike your
particular usecase, the kernel must handle attackers who are trying to
make it do the wrong thing as well as ill-timed writes.  So the VFS has
to ensure it does not use stale data from the filesystem.

This is a problem I'm somewhat interested in solving, but I'm a bit
busy with folios right now.  And once that project is done, improving
the page cache for reflinked files is next on my list, so I'm not likely
to get to this problem for a few years.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ