linux-kernel - Re: [RFC PATCH 16/20] famfs: Add fault counters

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <65d92f49ee454_1711029468@dwillia2-mobl3.amr.corp.intel.com.notmuch>
Date: Fri, 23 Feb 2024 15:50:33 -0800
From: Dan Williams <dan.j.williams@...el.com>
To: Dave Hansen <dave.hansen@...el.com>, John Groves <John@...ves.net>, "Dan
 Williams" <dan.j.williams@...el.com>
CC: John Groves <jgroves@...ron.com>, Jonathan Corbet <corbet@....net>,
	"Vishal Verma" <vishal.l.verma@...el.com>, Dave Jiang <dave.jiang@...el.com>,
	Alexander Viro <viro@...iv.linux.org.uk>, Christian Brauner
	<brauner@...nel.org>, Jan Kara <jack@...e.cz>, Matthew Wilcox
	<willy@...radead.org>, <linux-cxl@...r.kernel.org>,
	<linux-fsdevel@...r.kernel.org>, <linux-doc@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <nvdimm@...ts.linux.dev>,
	<john@...alactic.com>, Dave Chinner <david@...morbit.com>, Christoph Hellwig
	<hch@...radead.org>, <dave.hansen@...ux.intel.com>,
	<gregory.price@...verge.com>
Subject: Re: [RFC PATCH 16/20] famfs: Add fault counters

Dave Hansen wrote:
> On 2/23/24 12:39, John Groves wrote:
> >> We had similar unit test regression concerns with fsdax where some
> >> upstream change silently broke PMD faults. The solution there was trace
> >> points in the fault handlers and a basic test that knows apriori that it
> >> *should* be triggering a certain number of huge faults:
> >>
> >> https://github.com/pmem/ndctl/blob/main/test/dax.sh#L31
> > Good approach, thanks Dan! My working assumption is that we'll be able to make
> > that approach work in the famfs tests. So the fault counters should go away
> > in the next version.
> 
> I do really suspect there's something more generic that should be done
> here.  Maybe we need a generic 'huge_faults' perf event to pair up with
> the good ol' faults that we already have:
> 
> # perf stat -e faults /bin/ls
> 
>  Performance counter stats for '/bin/ls':
> 
>                104      faults
> 
> 
>        0.001499862 seconds time elapsed
> 
>        0.001490000 seconds user
>        0.000000000 seconds sys

Certainly something like that would have satisified this sanity test use
case. I will note that mm_account_fault() would need some help to figure
out the size of the page table entry that got installed. Maybe
extensions to vm_fault_reason to add VM_FAULT_P*D? That compliments
VM_FAULT_FALLBACK to indicate whether, for example, the fallback went
from PUD to PMD, or all the way back to PTE.

Then use cases like this could just add a dynamic probe in
mm_account_fault(). No real need for a new tracepoint unless there was a
use case for this outside of regression testing fault handlers, right?