[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <xhbru4aekyfl25552le5tvifwonyuwoyioxrqxy6zkm2xlyhc5@oqxnudb4bope>
Date: Fri, 28 Feb 2025 14:42:40 +1100
From: Alistair Popple <apopple@...dia.com>
To: akpm@...ux-foundation.org, dan.j.williams@...el.com,
linux-mm@...ck.org
Cc: Alison Schofield <alison.schofield@...el.com>, lina@...hilina.net,
zhang.lyra@...il.com, gerald.schaefer@...ux.ibm.com, vishal.l.verma@...el.com,
dave.jiang@...el.com, logang@...tatee.com, bhelgaas@...gle.com, jack@...e.cz,
jgg@...pe.ca, catalin.marinas@....com, will@...nel.org, mpe@...erman.id.au,
npiggin@...il.com, dave.hansen@...ux.intel.com, ira.weiny@...el.com,
willy@...radead.org, djwong@...nel.org, tytso@....edu, linmiaohe@...wei.com,
david@...hat.com, peterx@...hat.com, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linuxppc-dev@...ts.ozlabs.org, nvdimm@...ts.linux.dev, linux-cxl@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org, linux-xfs@...r.kernel.org,
jhubbard@...dia.com, hch@....de, david@...morbit.com, chenhuacai@...nel.org,
kernel@...0n.name, loongarch@...ts.linux.dev
Subject: Re: [PATCH v9 00/20] fs/dax: Fix ZONE_DEVICE page reference counts
Andrew,
This is essentially the same as what's currently in mm-unstable aside from
the two updates listed below. The main thing to note is it incorporates
Balbir's fixup which is currently in mm-unstable as c98612955016
("mm-allow-compound-zone-device-pages-fix-fix")
- Alistair
On Fri, Feb 28, 2025 at 02:30:55PM +1100, Alistair Popple wrote:
> Main updates since v8:
>
> - Fixed reading of bad pgmap in migrate_vma_collect_pmd() as reported/fixed
> by Balbir.
>
> - Fixed bad warnings generated in free_zone_device_folio() when pgmap->ops
> isn't defined, even if it's not required to be. As reported by Gerald.
>
> Main updates since v7:
>
> - Rebased on current akpm/mm-unstable in order to fix conflicts with
> https://lore.kernel.org/linux-mm/20241216155408.8102-1-willy@infradead.org/
> as requested by Andrew.
>
> - Collected Ack'ed/Reviewed by
>
> - Cleaned up a unnecessary and confusing assignment to pgtable.
>
> - Other minor reworks suggested by David Hildenbrand
>
> Main updates since v6:
>
> - Clean ups and fixes based on feedback from David and Dan.
>
> - Rebased from next-20241216 to v6.14-rc1. No conflicts.
>
> - Dropped the PTE bit removals and clean-ups - will post this as a
> separate series to be merged after this one as Dan wanted it split
> up more and this series is already too big.
>
> Main updates since v5:
>
> - Reworked patch 1 based on Dan's feedback.
>
> - Fixed build issues on PPC and when CONFIG_PGTABLE_HAS_HUGE_LEAVES
> is no defined.
>
> - Minor comment formatting and documentation fixes.
>
> - Remove PTE_DEVMAP definitions from Loongarch which were added since
> this series was initially written.
>
> Main updates since v4:
>
> - Removed most of the devdax/fsdax checks in fs/proc/task_mmu.c. This
> means smaps/pagemap may contain DAX pages.
>
> - Fixed rmap accounting of PUD mapped pages.
>
> - Minor code clean-ups.
>
> Main updates since v3:
>
> - Rebased onto next-20241216. The rebase wasn't too difficult, but in
> the interests of getting this out sooner for Andrew to look at as
> requested by him I have yet to extensively build/run test this
> version of the series.
>
> - Fixed a bunch of build breakages reported by John Hubbard and the
> kernel test robot due to various combinations of CONFIG options.
>
> - Split the rmap changes into a separate patch as suggested by David H.
>
> - Reworded the description for the P2PDMA change.
>
> Main updates since v2:
>
> - Rename the DAX specific dax_insert_XXX functions to vmf_insert_XXX
> and have them pass the vmf struct.
>
> - Separate out the device DAX changes.
>
> - Restore the page share mapping counting and associated warnings.
>
> - Rework truncate to require file-systems to have previously called
> dax_break_layout() to remove the address space mapping for a
> page. This found several bugs which are fixed by the first half of
> the series. The motivation for this was initially to allow the FS
> DAX page-cache mappings to hold a reference on the page.
>
> However that turned out to be a dead-end (see the comments on patch
> 21), but it found several bugs and I think overall it is an
> improvement so I have left it here.
>
> Device and FS DAX pages have always maintained their own page
> reference counts without following the normal rules for page reference
> counting. In particular pages are considered free when the refcount
> hits one rather than zero and refcounts are not added when mapping the
> page.
>
> Tracking this requires special PTE bits (PTE_DEVMAP) and a secondary
> mechanism for allowing GUP to hold references on the page (see
> get_dev_pagemap). However there doesn't seem to be any reason why FS
> DAX pages need their own reference counting scheme.
>
> By treating the refcounts on these pages the same way as normal pages
> we can remove a lot of special checks. In particular pXd_trans_huge()
> becomes the same as pXd_leaf(), although I haven't made that change
> here. It also frees up a valuable SW define PTE bit on architectures
> that have devmap PTE bits defined.
>
> It also almost certainly allows further clean-up of the devmap managed
> functions, but I have left that as a future improvment. It also
> enables support for compound ZONE_DEVICE pages which is one of my
> primary motivators for doing this work.
>
> Signed-off-by: Alistair Popple <apopple@...dia.com>
> Tested-by: Alison Schofield <alison.schofield@...el.com>
>
> ---
>
> Cc: lina@...hilina.net
> Cc: zhang.lyra@...il.com
> Cc: gerald.schaefer@...ux.ibm.com
> Cc: dan.j.williams@...el.com
> Cc: vishal.l.verma@...el.com
> Cc: dave.jiang@...el.com
> Cc: logang@...tatee.com
> Cc: bhelgaas@...gle.com
> Cc: jack@...e.cz
> Cc: jgg@...pe.ca
> Cc: catalin.marinas@....com
> Cc: will@...nel.org
> Cc: mpe@...erman.id.au
> Cc: npiggin@...il.com
> Cc: dave.hansen@...ux.intel.com
> Cc: ira.weiny@...el.com
> Cc: willy@...radead.org
> Cc: djwong@...nel.org
> Cc: tytso@....edu
> Cc: linmiaohe@...wei.com
> Cc: david@...hat.com
> Cc: peterx@...hat.com
> Cc: linux-doc@...r.kernel.org
> Cc: linux-kernel@...r.kernel.org
> Cc: linux-arm-kernel@...ts.infradead.org
> Cc: linuxppc-dev@...ts.ozlabs.org
> Cc: nvdimm@...ts.linux.dev
> Cc: linux-cxl@...r.kernel.org
> Cc: linux-fsdevel@...r.kernel.org
> Cc: linux-mm@...ck.org
> Cc: linux-ext4@...r.kernel.org
> Cc: linux-xfs@...r.kernel.org
> Cc: jhubbard@...dia.com
> Cc: hch@....de
> Cc: david@...morbit.com
> Cc: chenhuacai@...nel.org
> Cc: kernel@...0n.name
> Cc: loongarch@...ts.linux.dev
>
> Alistair Popple (19):
> fuse: Fix dax truncate/punch_hole fault path
> fs/dax: Return unmapped busy pages from dax_layout_busy_page_range()
> fs/dax: Don't skip locked entries when scanning entries
> fs/dax: Refactor wait for dax idle page
> fs/dax: Create a common implementation to break DAX layouts
> fs/dax: Always remove DAX page-cache entries when breaking layouts
> fs/dax: Ensure all pages are idle prior to filesystem unmount
> fs/dax: Remove PAGE_MAPPING_DAX_SHARED mapping flag
> mm/gup: Remove redundant check for PCI P2PDMA page
> mm/mm_init: Move p2pdma page refcount initialisation to p2pdma
> mm: Allow compound zone device pages
> mm/memory: Enhance insert_page_into_pte_locked() to create writable mappings
> mm/memory: Add vmf_insert_page_mkwrite()
> mm/rmap: Add support for PUD sized mappings to rmap
> mm/huge_memory: Add vmf_insert_folio_pud()
> mm/huge_memory: Add vmf_insert_folio_pmd()
> mm/gup: Don't allow FOLL_LONGTERM pinning of FS DAX pages
> fs/dax: Properly refcount fs dax pages
> device/dax: Properly refcount device dax pages when mapping
>
> Dan Williams (1):
> dcssblk: Mark DAX broken, remove FS_DAX_LIMITED support
>
> Documentation/filesystems/dax.rst | 1 +-
> drivers/dax/device.c | 15 +-
> drivers/gpu/drm/nouveau/nouveau_dmem.c | 3 +-
> drivers/nvdimm/pmem.c | 4 +-
> drivers/pci/p2pdma.c | 19 +-
> drivers/s390/block/Kconfig | 12 +-
> drivers/s390/block/dcssblk.c | 27 +-
> fs/dax.c | 365 +++++++++++++++++++-------
> fs/ext4/inode.c | 18 +-
> fs/fuse/dax.c | 30 +--
> fs/fuse/dir.c | 2 +-
> fs/fuse/file.c | 4 +-
> fs/fuse/virtio_fs.c | 3 +-
> fs/xfs/xfs_inode.c | 31 +--
> fs/xfs/xfs_inode.h | 2 +-
> fs/xfs/xfs_super.c | 12 +-
> include/linux/dax.h | 28 ++-
> include/linux/huge_mm.h | 4 +-
> include/linux/memremap.h | 17 +-
> include/linux/migrate.h | 4 +-
> include/linux/mm.h | 36 +---
> include/linux/mm_types.h | 16 +-
> include/linux/mmzone.h | 12 +-
> include/linux/page-flags.h | 6 +-
> include/linux/rmap.h | 15 +-
> lib/test_hmm.c | 3 +-
> mm/gup.c | 14 +-
> mm/hmm.c | 2 +-
> mm/huge_memory.c | 170 ++++++++++--
> mm/internal.h | 2 +-
> mm/memory-failure.c | 6 +-
> mm/memory.c | 69 ++++-
> mm/memremap.c | 60 ++--
> mm/migrate_device.c | 18 +-
> mm/mlock.c | 2 +-
> mm/mm_init.c | 23 +-
> mm/rmap.c | 67 ++++-
> mm/swap.c | 2 +-
> mm/truncate.c | 16 +-
> 39 files changed, 810 insertions(+), 330 deletions(-)
>
> base-commit: b2a64caeafad6e37df1c68f878bfdd06ff14f4ec
> --
> git-series 0.9.1
Powered by blists - more mailing lists