lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <26dc4594-430b-483c-a26c-7e68bade74b0@redhat.com>
Date: Sat, 1 Jun 2024 09:22:17 +0200
From: David Hildenbrand <david@...hat.com>
To: Dave Hansen <dave.hansen@...el.com>,
 Byungchul Park <lkml.byungchul.park@...il.com>
Cc: Byungchul Park <byungchul@...com>, linux-kernel@...r.kernel.org,
 linux-mm@...ck.org, kernel_team@...ynix.com, akpm@...ux-foundation.org,
 ying.huang@...el.com, vernhao@...cent.com, mgorman@...hsingularity.net,
 hughd@...gle.com, willy@...radead.org, peterz@...radead.org,
 luto@...nel.org, tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
 dave.hansen@...ux.intel.com, rjgolo@...il.com
Subject: Re: [PATCH v11 09/12] mm: implement LUF(Lazy Unmap Flush) defering
 tlb flush when folios get unmapped

On 31.05.24 23:46, Dave Hansen wrote:
> On 5/31/24 11:04, Byungchul Park wrote:
> ...
>> I don't believe you do not agree with the concept itself.  Thing is
>> the current version is not good enough.  I will do my best by doing
>> what I can do.
> 
> More performance is good.  I agree with that.
> 
> But it has to be weighed against the risk and the complexity.  The more
> I look at this approach, the more I think this is not a good trade off.
> There's a lot of risk and a lot of complexity and we haven't seen the
> full complexity picture.  The gaps are being fixed by adding complexity
> in new subsystems (the VFS in this case).
> 
> There are going to be winners and losers, and this version for example
> makes file writes lose performance.
> 
> Just to be crystal clear: I disagree with the concept of leaving stale
> TLB entries in place in an attempt to gain performance.

There is the inherent problem that a CPU reading from such (unmapped but 
not flushed yet) memory will not get a page fault, which I think is the 
most controversial part here (besides interaction with other deferred 
TLB flushing, and how this glues into the buddy).

What we used to do so far was limiting the timeframe where that could 
happen, under well-controlled circumstances. On the common unmap/zap 
path, we perform the batched TLB flush before any page faults / VMA 
changes would have be possible and munmap() would have returned with 
"succeess". Now that time frame could be significantly longer.

So in current code, at the point in time where we would process a page 
fault, mmap()/munmap()/... the TLB would have been flushed already.

To "mimic" the old behavior, we'd essentially have to force any page 
faults/mmap/whatsoever to perform the deferred flush such that the CPU 
will see the "reality" again. Not sure how that could be done in a 
*consistent* way (check whenever we take the mmap/vma lock etc ...) and 
if there would still be a performance win.

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ