lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <a6225180-9983-4a0a-8898-435b014b8ebe@huaweicloud.com> Date: Thu, 3 Jul 2025 22:13:51 +0800 From: Zhang Yi <yi.zhang@...weicloud.com> To: Theodore Ts'o <tytso@....edu>, "D, Suneeth" <Suneeth.D@....com> Cc: linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, willy@...radead.org, adilger.kernel@...ger.ca, jack@...e.cz, yi.zhang@...wei.com, libaokun1@...wei.com, yukuai3@...wei.com, yangerkun@...wei.com Subject: Re: [PATCH v2 8/8] ext4: enable large folio for regular file On 2025/6/26 22:56, Theodore Ts'o wrote: > On Thu, Jun 26, 2025 at 09:26:41PM +0800, Zhang Yi wrote: >> >> Thanks for the report, I will try to reproduce this performance regression on >> my machine and find out what caused this regression. > > I took a quick look at this, and I *think* it's because lmbench is > measuring the latency of mmap read's --- I'm going to guess 4k random > page faults, but I'm not sure. If that's the case, this may just be a > natural result of using large folios, and the tradeoff of optimizing > for large reads versus small page faults. > > But if you could take a closer look, that would be great, thanks! > After analyzing what the lmbench mmap test actually does, I found that the regression is related to the mmap writes, not mmap reads. In other words, the latency increases in ext4_page_mkwrite() after we enable large folios. The lmbench mmap test performed the following two tests: 1. mmap a range with PROT_READ|PROT_WRITE and MAP_SHARED, and then write one byte every 16KB sequentially. 2. mmap a range with PROT_READ and MAP_SHARED, and then read byte one by one sequentially. For the mmap read test, the average page fault latency on my machine can be improved from 3,634 ns to 2,005 ns. This improvement is due to the ability to save the folio readahead loop in page_cache_async_ra() and the set PTE loop in filemap_map_pages() after implementing support for large folios. For the mmap write test, the number of page faults does not decrease due to the large folio (the maximum order is 5), each page still incurs one page fault. However, the ext4_page_mkwrite() does multiple iterations through buffer_head in the folio, so the time consumption will increase. The latency of ext4_page_mkwrite() can be increased from 958ns to 1596ns. After looking at the comments in finish_fault() and 43e027e414232 ("mm: memory: extend finish_fault() to support large folio"). vm_fault_t finish_fault(struct vm_fault *vmf) { ... nr_pages = folio_nr_pages(folio); /* * Using per-page fault to maintain the uffd semantics, and same * approach also applies to non-anonymous-shmem faults to avoid * inflating the RSS of the process. */ if (!vma_is_anon_shmem(vma) || unlikely(userfaultfd_armed(vma)) || unlikely(needs_fallback)) { nr_pages = 1; ... set_pte_range(vmf, folio, page, nr_pages, addr); } I believe this regression can be resolved if the finish_fault() supports file-based large folios, but I'm not sure if we are planning to implement this. As for ext4_page_mkwrite(), I think it can also be optimized by reducing the number of the folio iterations, but this would make it impossible to use existing generic helpers and could make the code very messy. Best regards, Yi.
Powered by blists - more mailing lists