[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZmsP36zmg2-hgtak@bombadil.infradead.org>
Date: Thu, 13 Jun 2024 08:27:27 -0700
From: Luis Chamberlain <mcgrof@...nel.org>
To: David Hildenbrand <david@...hat.com>
Cc: Matthew Wilcox <willy@...radead.org>, Hugh Dickins <hughd@...gle.com>,
yang@...amperecomputing.com, linmiaohe@...wei.com,
muchun.song@...ux.dev, osalvador@...e.de,
"Pankaj Raghav (Samsung)" <kernel@...kajraghav.com>,
david@...morbit.com, djwong@...nel.org, chandan.babu@...cle.com,
brauner@...nel.org, akpm@...ux-foundation.org, linux-mm@...ck.org,
hare@...e.de, linux-kernel@...r.kernel.org,
Zi Yan <zi.yan@...t.com>, linux-xfs@...r.kernel.org,
p.raghav@...sung.com, linux-fsdevel@...r.kernel.org, hch@....de,
gost.dev@...sung.com, cl@...amperecomputing.com,
john.g.garry@...cle.com
Subject: Re: [PATCH v7 06/11] filemap: cap PTE range to be created to allowed
zero fill in folio_map_range()
On Thu, Jun 13, 2024 at 10:16:10AM +0200, David Hildenbrand wrote:
> On 13.06.24 10:13, Luis Chamberlain wrote:
> > On Thu, Jun 13, 2024 at 10:07:15AM +0200, David Hildenbrand wrote:
> > > On 13.06.24 09:57, Luis Chamberlain wrote:
> > > > On Wed, Jun 12, 2024 at 08:08:15PM +0100, Matthew Wilcox wrote:
> > > > > On Fri, Jun 07, 2024 at 02:58:57PM +0000, Pankaj Raghav (Samsung) wrote:
> > > > > > From: Pankaj Raghav <p.raghav@...sung.com>
> > > > > >
> > > > > > Usually the page cache does not extend beyond the size of the inode,
> > > > > > therefore, no PTEs are created for folios that extend beyond the size.
> > > > > >
> > > > > > But with LBS support, we might extend page cache beyond the size of the
> > > > > > inode as we need to guarantee folios of minimum order. Cap the PTE range
> > > > > > to be created for the page cache up to the max allowed zero-fill file
> > > > > > end, which is aligned to the PAGE_SIZE.
> > > > >
> > > > > I think this is slightly misleading because we might well zero-fill
> > > > > to the end of the folio. The issue is that we're supposed to SIGBUS
> > > > > if userspace accesses pages which lie entirely beyond the end of this
> > > > > file. Can you rephrase this?
> > > > >
> > > > > (from mmap(2))
> > > > > SIGBUS Attempted access to a page of the buffer that lies beyond the end
> > > > > of the mapped file. For an explanation of the treatment of the
> > > > > bytes in the page that corresponds to the end of a mapped file
> > > > > that is not a multiple of the page size, see NOTES.
> > > > >
> > > > >
> > > > > The code is good though.
> > > > >
> > > > > Reviewed-by: Matthew Wilcox (Oracle) <willy@...radead.org>
> > > >
> > > > Since I've been curating the respective fstests test to test for this
> > > > POSIX corner case [0] I wanted to enable the test for tmpfs instead of
> > > > skipping it as I originally had it, and that meant also realizing mmap(2)
> > > > specifically says this now:
> > > >
> > > > Huge page (Huge TLB) mappings
> > >
> > > Confusion alert: this likely talks about hugetlb (MAP_HUGETLB), not THP and
> > > friends.
> > >
> > > So it might not be required for below changes.
> >
> > Thanks, I had to ask as we're dusting off this little obscure corner of
> > the universe. Reason I ask, is the test fails for tmpfs with huge pages,
> > and this patch fixes it, but it got me wondering the above applies also
> > to tmpfs with huge pages.
>
> Is it tmpfs with THP/large folios or shmem with hugetlb? I assume the tmpfs
> with THP. There are not really mmap/munmap restrictions to THP and friends
> (because it's supposed to be "transparent" :) ).
The case I tested that failed the test was tmpfs with huge pages (not
large folios). So should we then have this:
diff --git a/mm/filemap.c b/mm/filemap.c
index ea78963f0956..649beb9bbc6b 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3617,6 +3617,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
vm_fault_t ret = 0;
unsigned long rss = 0;
unsigned int nr_pages = 0, mmap_miss = 0, mmap_miss_saved, folio_type;
+ unsigned int align = PAGE_SIZE;
rcu_read_lock();
folio = next_uptodate_folio(&xas, mapping, end_pgoff);
@@ -3636,7 +3637,16 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
goto out;
}
- file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1;
+ /*
+ * As per the mmap(2) mmap(), the offset must be a multiple of the
+ * underlying huge page size. The system automatically aligns length to
+ * be a multiple of the underlying huge page size.
+ */
+ if (folio_test_pmd_mappable(folio) &&
+ (shmem_mapping(mapping) || folio_test_hugetlb(folio)))
+ align = 1 << folio_order(folio);
+
+ file_end = DIV_ROUND_UP(i_size_read(mapping->host), align) - 1;
if (end_pgoff > file_end)
end_pgoff = file_end;
Powered by blists - more mailing lists