linux-kernel - Re: [PATCH v11 09/12] mm: implement LUF(Lazy Unmap Flush) defering tlb flush when folios get unmapped

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240604015348.GB26609@system.software.com>
Date: Tue, 4 Jun 2024 10:53:48 +0900
From: Byungchul Park <byungchul@...com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: David Hildenbrand <david@...hat.com>,
	Byungchul Park <lkml.byungchul.park@...il.com>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	kernel_team@...ynix.com, akpm@...ux-foundation.org,
	ying.huang@...el.com, vernhao@...cent.com,
	mgorman@...hsingularity.net, hughd@...gle.com, willy@...radead.org,
	peterz@...radead.org, luto@...nel.org, tglx@...utronix.de,
	mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com,
	rjgolo@...il.com
Subject: Re: [PATCH v11 09/12] mm: implement LUF(Lazy Unmap Flush) defering
 tlb flush when folios get unmapped

On Mon, Jun 03, 2024 at 06:23:46AM -0700, Dave Hansen wrote:
> On 6/3/24 02:35, Byungchul Park wrote:
> ...> In luf's point of view, the points where the deferred flush should be
> > performed are simply:
> > 
> > 	1. when changing the vma maps, that might be luf'ed.
> > 	2. when updating data of the pages, that might be luf'ed.
> 
> It's simple, but the devil is in the details as always.

Agree with that.

> > All we need to do is to indentify the points:
> > 
> > 	1. when changing the vma maps, that might be luf'ed.
> > 
> > 	   a) mmap and munmap e.i. fault handler or unmap_region().
> > 	   b) permission to writable e.i. mprotect or fault handler.
> > 	   c) what I'm missing.
> 
> I'd say it even more generally: anything that installs a PTE which is
> inconsistent with the original PTE.  That, of course, includes writes.
> But it also includes crazy things that we do like uprobes.  Take a look
> at __replace_page().
> 
> I think the page_vma_mapped_walk() checks plus the ptl keep LUF at bay
> there.  But it needs some really thorough review.
> 
> But the bigger concern is that, if there was a problem, I can't think of
> a systematic way to find it.
> 
> > 	2. when updating data of the pages, that might be luf'ed.
> > 
> > 	   a) updating files through vfs e.g. file_end_write().
> > 	   b) updating files through writable maps e.i. 1-a) or 1-b).
> > 	   c) what I'm missing.
> 
> Filesystems or block devices that change content without a "write" from
> the local system.  Network filesystems and block devices come to mind.

AFAIK, every network filesystem eventully "updates" its connected local
filesystem.  It could be still handled at the point where updating the
local file system.

> I honestly don't know what all the rules are around these, but they
> could certainly be troublesome.
> 
> There appear to be some interactions for NFS between file locking and
> page cache flushing.
> 
> But, stepping back ...
> 
> I'd honestly be a lot more comfortable if there was even a debugging LUF

I'd better provide a method for better debugging.  Lemme know whatever
it is we need.

> mode that enforced a rule that said:

Why "debugging mode"?  The following rules should be enforced always.

>   1. A LUF'd PTE can't be rewritten until after a luf_flush() occurs

"luf_flush() should be followed when.." is more correct because
"luf_flush() -> another luf -> the pte gets rewritten" can happen.  So
it should be "the pte gets rewritten -> another luf by any chance ->
luf_flush()", that is still safe.

>   2. A LUF'd page's position in the page cache can't be replaced until
>      after a luf_flush()

"luf_flush() should be followed when.." is more correct too.

These two rules are exactly same as what I described but more specific.
I like your way to describe the rules.

	Byungchul

> or *some* other independent set of rules that can tell us when something
> goes wrong.  That uprobes code, for instance, seems like it will work.
> But I can also imagine writing it ten other ways where it would break
> when combined with LUF.