lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <wysuus23bqmjtwkfu3zutqtmkse3ki3erf45x32yezlrl24qto@xlqt7qducyld> Date: Mon, 16 Dec 2024 11:55:30 +1100 From: Alistair Popple <apopple@...dia.com> To: David Hildenbrand <david@...hat.com> Cc: Dan Williams <dan.j.williams@...el.com>, linux-mm@...ck.org, lina@...hilina.net, zhang.lyra@...il.com, gerald.schaefer@...ux.ibm.com, vishal.l.verma@...el.com, dave.jiang@...el.com, logang@...tatee.com, bhelgaas@...gle.com, jack@...e.cz, jgg@...pe.ca, catalin.marinas@....com, will@...nel.org, mpe@...erman.id.au, npiggin@...il.com, dave.hansen@...ux.intel.com, ira.weiny@...el.com, willy@...radead.org, djwong@...nel.org, tytso@....edu, linmiaohe@...wei.com, peterx@...hat.com, linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org, linuxppc-dev@...ts.ozlabs.org, nvdimm@...ts.linux.dev, linux-cxl@...r.kernel.org, linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org, linux-xfs@...r.kernel.org, jhubbard@...dia.com, hch@....de, david@...morbit.com, akpm@...ux-foundation.org, sfr@...b.auug.org.au Subject: Re: [PATCH v3 00/25] fs/dax: Fix ZONE_DEVICE page reference counts On Sat, Dec 14, 2024 at 04:22:58PM +0100, David Hildenbrand wrote: > On 14.12.24 02:39, Dan Williams wrote: > > [ add akpm and sfr for next steps ] > > > > Alistair Popple wrote: > > > Main updates since v2: > > > > > > - Rename the DAX specific dax_insert_XXX functions to vmf_insert_XXX > > > and have them pass the vmf struct. > > > > > > - Seperate out the device DAX changes. > > > > > > - Restore the page share mapping counting and associated warnings. > > > > > > - Rework truncate to require file-systems to have previously called > > > dax_break_layout() to remove the address space mapping for a > > > page. This found several bugs which are fixed by the first half of > > > the series. The motivation for this was initially to allow the FS > > > DAX page-cache mappings to hold a reference on the page. > > > > > > However that turned out to be a dead-end (see the comments on patch > > > 21), but it found several bugs and I think overall it is an > > > improvement so I have left it here. > > > > > > Device and FS DAX pages have always maintained their own page > > > reference counts without following the normal rules for page reference > > > counting. In particular pages are considered free when the refcount > > > hits one rather than zero and refcounts are not added when mapping the > > > page. > > > > > > Tracking this requires special PTE bits (PTE_DEVMAP) and a secondary > > > mechanism for allowing GUP to hold references on the page (see > > > get_dev_pagemap). However there doesn't seem to be any reason why FS > > > DAX pages need their own reference counting scheme. > > > > > > By treating the refcounts on these pages the same way as normal pages > > > we can remove a lot of special checks. In particular pXd_trans_huge() > > > becomes the same as pXd_leaf(), although I haven't made that change > > > here. It also frees up a valuable SW define PTE bit on architectures > > > that have devmap PTE bits defined. > > > > > > It also almost certainly allows further clean-up of the devmap managed > > > functions, but I have left that as a future improvment. It also > > > enables support for compound ZONE_DEVICE pages which is one of my > > > primary motivators for doing this work. > > > > So this is feeling ready for -next exposure, and ideally merged for v6.14. I > > see the comments from John and Bjorn and that you were going to respin for > > that, but if it's just those details things they can probably be handled > > incrementally. > > > > Alistair, are you ready for this to hit -next? Yeah, I'm pretty happy with the series now. It "feels" right. There's a couple of dumb build bot errors, so I was going to respin to fix those as well. I got caught up with a few other things so was just letting this sit awaiting feedback, but I should be able to post a respin early this week. > > As for which tree... > > > > Andrew, we could take this through -mm, but my first instinct would be to try > > to take it through nvdimm.git mainly to offload any conflict wrangling work and > > small fixups which are likely to be an ongoing trickle. > > > > However, I am not going to put up much of a fight if others prefer this go > > through -mm. > > > > Thoughts? > > I'm in the process of preparing v2 of [1] that will result in conflicts with > this series in the rmap code (in particular [PATCH v3 14/25] huge_memory: > Allow mappings of PUD sized pages). > > I'll be away for 2 weeks over Christmas, but I assume I'll manage to post v2 > shortly. > > Which reminds me that I still have to take a closer look at some things in > this series :) Especially also #14 regarding accounting. > > I wonder if we could split out the rmap changes in #14, and have that patch > simply in two trees? No idea. I could split out the first half (patches 1 - 8) into a series to go via nvdimm.git, because they are actually standalone clean ups that I think are worthwhile anyway. The remainder are more -mm focussed. However they do depend on the fs/dax cleanups in the first half so the trick would be making sure Andrew only takes them if the nvdimm.git changes have made it into -next. I'm happy with either approach, so let me know if I should split the series or not. - Alistair > [1] > https://lore.kernel.org/all/20240829165627.2256514-1-david@redhat.com/T/#u > > -- > Cheers, > > David / dhildenb >
Powered by blists - more mailing lists