linux-kernel - Re: [PATCH v2 2/5] mm: avoid unnecessary flush on change_huge

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <640A6374-A06B-4E20-BF5D-9A21CC85CB12@gmail.com>
Date:   Tue, 26 Oct 2021 13:07:37 -0700
From:   Nadav Amit <nadav.amit@...il.com>
To:     Dave Hansen <dave.hansen@...el.com>
Cc:     Linux-MM <linux-mm@...ck.org>, LKML <linux-kernel@...r.kernel.org>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Andrew Cooper <andrew.cooper3@...rix.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Andy Lutomirski <luto@...nel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Peter Xu <peterx@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Will Deacon <will@...nel.org>, Yu Zhao <yuzhao@...gle.com>,
        Nick Piggin <npiggin@...il.com>,
        "x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH v2 2/5] mm: avoid unnecessary flush on change_huge_pmd()



> On Oct 26, 2021, at 12:40 PM, Dave Hansen <dave.hansen@...el.com> wrote:
> 
> On 10/26/21 12:06 PM, Nadav Amit wrote:
>> 
>> To make it very clear - consider the following scenario, in which
>> a volatile pointer p is mapped using a certain PTE, which is RW
>> (i.e., *p is writable):
>> 
>>  CPU0				CPU1
>>  ----				----
>>  x = *p
>>  [ PTE cached in TLB; 
>>    PTE is not dirty ]
>> 				clear_pte(PTE)
>>  *p = x
>>  [ needs to set dirty ]
>> 
>> Note that there is no TLB flush in this scenario. The question
>> is whether the write access to *p would succeed, setting the
>> dirty bit on the clear, non-present entry.
>> 
>> I was under the impression that the hardware AD-assist would
>> recheck the PTE atomically as it sets the dirty bit. But, as I
>> said, I am not sure anymore whether this is defined architecturally
>> (or at least would work in practice on all CPUs modulo the 
>> Knights Landing thingy).
> 
> Practically, at "x=*p", he thing that gets cached in the TLB will
> Dirty=0.  At the "*p=x", the CPU will decide it needs to do a write,
> find the Dirty=0 entry and will entirely discard it.  In other words, it
> *acts* roughly like this:
> 
> 	x = *p				
> 	INVLPG(p)
> 	*p = x;
> 
> Where the INVLPG() and the "*p=x" are atomic.  So, there's no
> _practical_ problem with your scenario.  This specific behavior isn't
> architectural as far as I know, though.
> 
> Although it's pretty much just academic, as for the architecture, are
> you getting hung up on the difference between the description of "Accessed":
> 
> 	Whenever the processor uses a paging-structure entry as part of
> 	linear-address translation, it sets the accessed flag in that
> 	entry
> 
> and "Dirty:"
> 
> 	Whenever there is a write to a linear address, the processor
> 	sets the dirty flag (if it is not already set) in the paging-
> 	structure entry...
> 
> Accessed says "as part of linear-address translation", which means that
> the address must have a translation.  But, the "Dirty" section doesn't
> say that.  It talks about "a write to a linear address" but not whether
> there is a linear address *translation* involved.
> 
> If that's it, we could probably add a bit like:
> 
> 	In addition to setting the accessed flag, whenever there is a
> 	write...
> 
> before the dirty rules in the SDM.
> 
> Or am I being dense and continuing to miss your point? :)

I think this time you got my question right.

I was thrown off by the SDM comment on RW permissions vs dirty that I
mentioned before:

"If software on one logical processor writes to a page while software on
 another logical processor concurrently clears the R/W flag in the
 paging-structure entry that maps the page, execution on some processors may
 result in the entry’s dirty flag being set (due to the write on the first
 logical processor) and the entry’s R/W flag being clear (due to the update
 to the entry on the second logical processor).”

I did not pay enough attention to these small differences that you mentioned
between access and dirty this time (although I did notice them before).

I do not think that the change that you offered to the SDM really clarifies
the situation. Setting the access flag is done as part of caching the PTE in
the TLB. The SDM change you propose does not clarify the atomicity of the
permission/PTE-validity check and dirty-bit setting or the fact the PTE is
invalidated if the dirty-bit needs to be set and is cached as clear [I do not
presume you would want the latter in the SDM, since it is an implementation
detail.]

I just wonder how come the R/W-clearing and the P-clearing cause concurrent
dirty bit setting to behave differently. I am not a hardware guy, but I would
imagine they would be the same...