[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <65d8fa6736a18_2509b29410@dwillia2-mobl3.amr.corp.intel.com.notmuch>
Date: Fri, 23 Feb 2024 12:04:55 -0800
From: Dan Williams <dan.j.williams@...el.com>
To: John Groves <John@...ves.net>, Dave Hansen <dave.hansen@...el.com>
CC: John Groves <jgroves@...ron.com>, Jonathan Corbet <corbet@....net>, "Dan
Williams" <dan.j.williams@...el.com>, Vishal Verma
<vishal.l.verma@...el.com>, Dave Jiang <dave.jiang@...el.com>, Alexander Viro
<viro@...iv.linux.org.uk>, Christian Brauner <brauner@...nel.org>, Jan Kara
<jack@...e.cz>, "Matthew Wilcox" <willy@...radead.org>,
<linux-cxl@...r.kernel.org>, <linux-fsdevel@...r.kernel.org>,
<linux-doc@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<nvdimm@...ts.linux.dev>, <john@...alactic.com>, Dave Chinner
<david@...morbit.com>, Christoph Hellwig <hch@...radead.org>,
<dave.hansen@...ux.intel.com>, <gregory.price@...verge.com>
Subject: Re: [RFC PATCH 16/20] famfs: Add fault counters
John Groves wrote:
> On 24/02/23 10:23AM, Dave Hansen wrote:
> > On 2/23/24 09:42, John Groves wrote:
> > > One of the key requirements for famfs is that it service vma faults
> > > efficiently. Our metadata helps - the search order is n for n extents,
> > > and n is usually 1. But we can still observe gnarly lock contention
> > > in mm if PTE faults are happening. This commit introduces fault counters
> > > that can be enabled and read via /sys/fs/famfs/...
> > >
> > > These counters have proved useful in troubleshooting situations where
> > > PTE faults were happening instead of PMD. No performance impact when
> > > disabled.
> >
> > This seems kinda wonky. Why does _this_ specific filesystem need its
> > own fault counters. Seems like something we'd want to do much more
> > generically, if it is needed at all.
> >
> > Was the issue here just that vm_ops->fault() was getting called instead
> > of ->huge_fault()? Or something more subtle?
>
> Thanks for your reply Dave!
>
> First, I'm willing to pull the fault counters out if the brain trust doesn't
> like them.
>
> I put them in because we were running benchmarks of computational data
> analytics and and noted that jobs took 3x as long on famfs as raw dax -
> which indicated I was doing something wrong, because it should be equivalent
> or very close.
>
> The the solution was to call thp_get_unmapped_area() in
> famfs_file_operations, and performance doesn't vary significantly from raw
> dax now. Prior to that I wasn't making sure the mmap address was PMD aligned.
>
> After that I wanted a way to be double-secret-certain that it was servicing
> PMD faults as intended. Which it basically always is, so far. (The smoke
> tests in user space check this.)
We had similar unit test regression concerns with fsdax where some
upstream change silently broke PMD faults. The solution there was trace
points in the fault handlers and a basic test that knows apriori that it
*should* be triggering a certain number of huge faults:
https://github.com/pmem/ndctl/blob/main/test/dax.sh#L31
Powered by blists - more mailing lists