lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <xhbru4aekyfl25552le5tvifwonyuwoyioxrqxy6zkm2xlyhc5@oqxnudb4bope>
Date: Fri, 28 Feb 2025 14:42:40 +1100
From: Alistair Popple <apopple@...dia.com>
To: akpm@...ux-foundation.org, dan.j.williams@...el.com, 
	linux-mm@...ck.org
Cc: Alison Schofield <alison.schofield@...el.com>, lina@...hilina.net, 
	zhang.lyra@...il.com, gerald.schaefer@...ux.ibm.com, vishal.l.verma@...el.com, 
	dave.jiang@...el.com, logang@...tatee.com, bhelgaas@...gle.com, jack@...e.cz, 
	jgg@...pe.ca, catalin.marinas@....com, will@...nel.org, mpe@...erman.id.au, 
	npiggin@...il.com, dave.hansen@...ux.intel.com, ira.weiny@...el.com, 
	willy@...radead.org, djwong@...nel.org, tytso@....edu, linmiaohe@...wei.com, 
	david@...hat.com, peterx@...hat.com, linux-doc@...r.kernel.org, 
	linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org, 
	linuxppc-dev@...ts.ozlabs.org, nvdimm@...ts.linux.dev, linux-cxl@...r.kernel.org, 
	linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org, linux-xfs@...r.kernel.org, 
	jhubbard@...dia.com, hch@....de, david@...morbit.com, chenhuacai@...nel.org, 
	kernel@...0n.name, loongarch@...ts.linux.dev
Subject: Re: [PATCH v9 00/20] fs/dax: Fix ZONE_DEVICE page reference counts

Andrew,

This is essentially the same as what's currently in mm-unstable aside from
the two updates listed below. The main thing to note is it incorporates
Balbir's fixup which is currently in mm-unstable as c98612955016
("mm-allow-compound-zone-device-pages-fix-fix")

 - Alistair

On Fri, Feb 28, 2025 at 02:30:55PM +1100, Alistair Popple wrote:
> Main updates since v8:
> 
>  - Fixed reading of bad pgmap in migrate_vma_collect_pmd() as reported/fixed
>    by Balbir.
> 
>  - Fixed bad warnings generated in free_zone_device_folio() when pgmap->ops
>    isn't defined, even if it's not required to be. As reported by Gerald.
> 
> Main updates since v7:
> 
>  - Rebased on current akpm/mm-unstable in order to fix conflicts with
>    https://lore.kernel.org/linux-mm/20241216155408.8102-1-willy@infradead.org/
>    as requested by Andrew.
> 
>  - Collected Ack'ed/Reviewed by
> 
>  - Cleaned up a unnecessary and confusing assignment to pgtable.
> 
>  - Other minor reworks suggested by David Hildenbrand
> 
> Main updates since v6:
> 
>  - Clean ups and fixes based on feedback from David and Dan.
> 
>  - Rebased from next-20241216 to v6.14-rc1. No conflicts.
> 
>  - Dropped the PTE bit removals and clean-ups - will post this as a
>    separate series to be merged after this one as Dan wanted it split
>    up more and this series is already too big.
> 
> Main updates since v5:
> 
>  - Reworked patch 1 based on Dan's feedback.
> 
>  - Fixed build issues on PPC and when CONFIG_PGTABLE_HAS_HUGE_LEAVES
>    is no defined.
> 
>  - Minor comment formatting and documentation fixes.
> 
>  - Remove PTE_DEVMAP definitions from Loongarch which were added since
>    this series was initially written.
> 
> Main updates since v4:
> 
>  - Removed most of the devdax/fsdax checks in fs/proc/task_mmu.c. This
>    means smaps/pagemap may contain DAX pages.
> 
>  - Fixed rmap accounting of PUD mapped pages.
> 
>  - Minor code clean-ups.
> 
> Main updates since v3:
> 
>  - Rebased onto next-20241216. The rebase wasn't too difficult, but in
>    the interests of getting this out sooner for Andrew to look at as
>    requested by him I have yet to extensively build/run test this
>    version of the series.
> 
>  - Fixed a bunch of build breakages reported by John Hubbard and the
>    kernel test robot due to various combinations of CONFIG options.
> 
>  - Split the rmap changes into a separate patch as suggested by David H.
> 
>  - Reworded the description for the P2PDMA change.
> 
> Main updates since v2:
> 
>  - Rename the DAX specific dax_insert_XXX functions to vmf_insert_XXX
>    and have them pass the vmf struct.
> 
>  - Separate out the device DAX changes.
> 
>  - Restore the page share mapping counting and associated warnings.
> 
>  - Rework truncate to require file-systems to have previously called
>    dax_break_layout() to remove the address space mapping for a
>    page. This found several bugs which are fixed by the first half of
>    the series. The motivation for this was initially to allow the FS
>    DAX page-cache mappings to hold a reference on the page.
> 
>    However that turned out to be a dead-end (see the comments on patch
>    21), but it found several bugs and I think overall it is an
>    improvement so I have left it here.
> 
> Device and FS DAX pages have always maintained their own page
> reference counts without following the normal rules for page reference
> counting. In particular pages are considered free when the refcount
> hits one rather than zero and refcounts are not added when mapping the
> page.
> 
> Tracking this requires special PTE bits (PTE_DEVMAP) and a secondary
> mechanism for allowing GUP to hold references on the page (see
> get_dev_pagemap). However there doesn't seem to be any reason why FS
> DAX pages need their own reference counting scheme.
> 
> By treating the refcounts on these pages the same way as normal pages
> we can remove a lot of special checks. In particular pXd_trans_huge()
> becomes the same as pXd_leaf(), although I haven't made that change
> here. It also frees up a valuable SW define PTE bit on architectures
> that have devmap PTE bits defined.
> 
> It also almost certainly allows further clean-up of the devmap managed
> functions, but I have left that as a future improvment. It also
> enables support for compound ZONE_DEVICE pages which is one of my
> primary motivators for doing this work.
> 
> Signed-off-by: Alistair Popple <apopple@...dia.com>
> Tested-by: Alison Schofield <alison.schofield@...el.com>
> 
> ---
> 
> Cc: lina@...hilina.net
> Cc: zhang.lyra@...il.com
> Cc: gerald.schaefer@...ux.ibm.com
> Cc: dan.j.williams@...el.com
> Cc: vishal.l.verma@...el.com
> Cc: dave.jiang@...el.com
> Cc: logang@...tatee.com
> Cc: bhelgaas@...gle.com
> Cc: jack@...e.cz
> Cc: jgg@...pe.ca
> Cc: catalin.marinas@....com
> Cc: will@...nel.org
> Cc: mpe@...erman.id.au
> Cc: npiggin@...il.com
> Cc: dave.hansen@...ux.intel.com
> Cc: ira.weiny@...el.com
> Cc: willy@...radead.org
> Cc: djwong@...nel.org
> Cc: tytso@....edu
> Cc: linmiaohe@...wei.com
> Cc: david@...hat.com
> Cc: peterx@...hat.com
> Cc: linux-doc@...r.kernel.org
> Cc: linux-kernel@...r.kernel.org
> Cc: linux-arm-kernel@...ts.infradead.org
> Cc: linuxppc-dev@...ts.ozlabs.org
> Cc: nvdimm@...ts.linux.dev
> Cc: linux-cxl@...r.kernel.org
> Cc: linux-fsdevel@...r.kernel.org
> Cc: linux-mm@...ck.org
> Cc: linux-ext4@...r.kernel.org
> Cc: linux-xfs@...r.kernel.org
> Cc: jhubbard@...dia.com
> Cc: hch@....de
> Cc: david@...morbit.com
> Cc: chenhuacai@...nel.org
> Cc: kernel@...0n.name
> Cc: loongarch@...ts.linux.dev
> 
> Alistair Popple (19):
>   fuse: Fix dax truncate/punch_hole fault path
>   fs/dax: Return unmapped busy pages from dax_layout_busy_page_range()
>   fs/dax: Don't skip locked entries when scanning entries
>   fs/dax: Refactor wait for dax idle page
>   fs/dax: Create a common implementation to break DAX layouts
>   fs/dax: Always remove DAX page-cache entries when breaking layouts
>   fs/dax: Ensure all pages are idle prior to filesystem unmount
>   fs/dax: Remove PAGE_MAPPING_DAX_SHARED mapping flag
>   mm/gup: Remove redundant check for PCI P2PDMA page
>   mm/mm_init: Move p2pdma page refcount initialisation to p2pdma
>   mm: Allow compound zone device pages
>   mm/memory: Enhance insert_page_into_pte_locked() to create writable mappings
>   mm/memory: Add vmf_insert_page_mkwrite()
>   mm/rmap: Add support for PUD sized mappings to rmap
>   mm/huge_memory: Add vmf_insert_folio_pud()
>   mm/huge_memory: Add vmf_insert_folio_pmd()
>   mm/gup: Don't allow FOLL_LONGTERM pinning of FS DAX pages
>   fs/dax: Properly refcount fs dax pages
>   device/dax: Properly refcount device dax pages when mapping
> 
> Dan Williams (1):
>   dcssblk: Mark DAX broken, remove FS_DAX_LIMITED support
> 
>  Documentation/filesystems/dax.rst      |   1 +-
>  drivers/dax/device.c                   |  15 +-
>  drivers/gpu/drm/nouveau/nouveau_dmem.c |   3 +-
>  drivers/nvdimm/pmem.c                  |   4 +-
>  drivers/pci/p2pdma.c                   |  19 +-
>  drivers/s390/block/Kconfig             |  12 +-
>  drivers/s390/block/dcssblk.c           |  27 +-
>  fs/dax.c                               | 365 +++++++++++++++++++-------
>  fs/ext4/inode.c                        |  18 +-
>  fs/fuse/dax.c                          |  30 +--
>  fs/fuse/dir.c                          |   2 +-
>  fs/fuse/file.c                         |   4 +-
>  fs/fuse/virtio_fs.c                    |   3 +-
>  fs/xfs/xfs_inode.c                     |  31 +--
>  fs/xfs/xfs_inode.h                     |   2 +-
>  fs/xfs/xfs_super.c                     |  12 +-
>  include/linux/dax.h                    |  28 ++-
>  include/linux/huge_mm.h                |   4 +-
>  include/linux/memremap.h               |  17 +-
>  include/linux/migrate.h                |   4 +-
>  include/linux/mm.h                     |  36 +---
>  include/linux/mm_types.h               |  16 +-
>  include/linux/mmzone.h                 |  12 +-
>  include/linux/page-flags.h             |   6 +-
>  include/linux/rmap.h                   |  15 +-
>  lib/test_hmm.c                         |   3 +-
>  mm/gup.c                               |  14 +-
>  mm/hmm.c                               |   2 +-
>  mm/huge_memory.c                       | 170 ++++++++++--
>  mm/internal.h                          |   2 +-
>  mm/memory-failure.c                    |   6 +-
>  mm/memory.c                            |  69 ++++-
>  mm/memremap.c                          |  60 ++--
>  mm/migrate_device.c                    |  18 +-
>  mm/mlock.c                             |   2 +-
>  mm/mm_init.c                           |  23 +-
>  mm/rmap.c                              |  67 ++++-
>  mm/swap.c                              |   2 +-
>  mm/truncate.c                          |  16 +-
>  39 files changed, 810 insertions(+), 330 deletions(-)
> 
> base-commit: b2a64caeafad6e37df1c68f878bfdd06ff14f4ec
> -- 
> git-series 0.9.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ