[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <66181dd83f74e_15786294e8@dwillia2-mobl3.amr.corp.intel.com.notmuch>
Date: Thu, 11 Apr 2024 10:28:56 -0700
From: Dan Williams <dan.j.williams@...el.com>
To: Alistair Popple <apopple@...dia.com>, <linux-mm@...ck.org>
CC: <david@...morbit.com>, <dan.j.williams@...el.com>, <jhubbard@...dia.com>,
<rcampbell@...dia.com>, <willy@...radead.org>, <jgg@...dia.com>,
<linux-fsdevel@...r.kernel.org>, <jack@...e.cz>, <djwong@...nel.org>,
<hch@....de>, <david@...hat.com>, <ruansy.fnst@...itsu.com>,
<nvdimm@...ts.linux.dev>, <linux-xfs@...r.kernel.org>,
<linux-ext4@...r.kernel.org>, <jglisse@...hat.com>, Alistair Popple
<apopple@...dia.com>
Subject: Re: [RFC 00/10] fs/dax: Fix FS DAX page reference counts
Alistair Popple wrote:
> FS DAX pages have always maintained their own page reference counts
> without following the normal rules for page reference counting. In
> particular pages are considered free when the refcount hits one rather
> than zero and refcounts are not added when mapping the page.
> Tracking this requires special PTE bits (PTE_DEVMAP) and a secondary
> mechanism for allowing GUP to hold references on the page (see
> get_dev_pagemap). However there doesn't seem to be any reason why FS
> DAX pages need their own reference counting scheme.
This is fair. However, for anyone coming in fresh to this situation
maybe some more "how we get here" history helps. That longer story is
here:
http://lore.kernel.org/all/166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com/
> This RFC is an initial attempt at removing the special reference
> counting and instead refcount FS DAX pages the same as normal pages.
>
> There are still a couple of rough edges - in particular I haven't
> completely removed the devmap PTE bit references from arch specific
> code and there is probably some more cleanup of dev_pagemap reference
> counting that could be done, particular in mm/gup.c. I also haven't
> yet compiled on anything other than x86_64.
>
> Before continuing further with this clean-up though I would appreciate
> some feedback on the viability of this approach and any issues I may
> have overlooked, as I am not intimately familiar with FS DAX code (or
> for that matter the FS layer in general).
>
> I have of course run some basic testing which didn't reveal any
> problems.
FWIW I see the following with the ndctl/dax test-suite (double-checked
that vanilla v6.6 passes). I will take a look at the patches, but in the
meantime...
# meson test -C build --suite ndctl:dax
ninja: no work to do.
ninja: Entering directory `/root/git/ndctl/build'
[1/70] Generating version.h with a custom command
1/13 ndctl:dax / daxdev-errors.sh OK 14.46s
2/13 ndctl:dax / multi-dax.sh OK 2.70s
3/13 ndctl:dax / sub-section.sh OK 7.21s
4/13 ndctl:dax / dax-dev OK 0.08s
[5/13] 🌖 ndctl:dax / dax-ext4.sh 0/600s
...that last test crashed with:
EXT4-fs (pmem0): mounted filesystem 2adea02a-a791-4714-be40-125afd16634b r/w with ordered
ota mode: none.
page:ffffea0005f00000 refcount:0 mapcount:0 mapping:ffff8882a8a6be10 index:0x5800 pfn:0x1
head:ffffea0005f00000 order:9 entire_mapcount:0 nr_pages_mapped:0 pincount:0
aops:ext4_dax_aops ino:c dentry name:"image"
flags: 0x4ffff800004040(reserved|head|node=0|zone=4|lastcpupid=0x1ffff)
page_type: 0xffffffff()
raw: 004ffff800004040 ffff888202681520 0000000000000000 ffff8882a8a6be10
raw: 0000000000005800 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: VM_BUG_ON_FOLIO(((unsigned int) folio_ref_count(folio) + 127u <= 127
------------[ cut here ]------------
kernel BUG at include/linux/mm.h:1419!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 1415 Comm: dax-pmd Tainted: G OE N 6.6.0+ #209
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023
RIP: 0010:dax_insert_pfn_pmd+0x41c/0x430
Code: 89 c1 41 b8 01 00 00 00 48 89 ea 4c 89 e6 4c 89 f7 e8 18 8a c7 ff e9 e0 fc ff ff 48
c b3 48 89 c7 e8 a4 53 f7 ff <0f> 0b e8 0d ba a8 00 48 8b 15 86 8a 62 01 e9 89 fc ff ff 90
RSP: 0000:ffffc90001d57b68 EFLAGS: 00010246
RAX: 000000000000005c RBX: ffffea0005f00000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffffb3749a15 RDI: 00000000ffffffff
RBP: ffff8882982c07e0 R08: 00000000ffffdfff R09: 0000000000000001
R10: 00000000ffffdfff R11: ffffffffb3a771c0 R12: 800000017c0008e7
R13: 8000000000000025 R14: ffff888202a395f8 R15: ffffea0005f00000
FS: 00007fdaa00e3d80(0000) GS:ffff888477000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fda9f800000 CR3: 0000000296224000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
? die+0x32/0x80
? do_trap+0xd6/0x100
? dax_insert_pfn_pmd+0x41c/0x430
? dax_insert_pfn_pmd+0x41c/0x430
? do_error_trap+0x81/0x110
? dax_insert_pfn_pmd+0x41c/0x430
? exc_invalid_op+0x4c/0x60
? dax_insert_pfn_pmd+0x41c/0x430
? asm_exc_invalid_op+0x16/0x20
? dax_insert_pfn_pmd+0x41c/0x430
? dax_insert_pfn_pmd+0x41c/0x430
dax_fault_iter+0x5d0/0x700
dax_iomap_pmd_fault+0x212/0x450
ext4_dax_huge_fault+0x1dc/0x470
__handle_mm_fault+0x808/0x13e0
handle_mm_fault+0x178/0x3e0
do_user_addr_fault+0x186/0x830
exc_page_fault+0x6f/0x1d0
asm_exc_page_fault+0x22/0x30
RIP: 0033:0x7fdaa072d009
Powered by blists - more mailing lists