lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 26 Mar 2024 18:04:45 +0100
From: David Hildenbrand <david@...hat.com>
To: Ryan Roberts <ryan.roberts@....com>, Mark Rutland <mark.rutland@....com>,
 Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>,
 Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
 Jiri Olsa <jolsa@...nel.org>, Ian Rogers <irogers@...gle.com>,
 Adrian Hunter <adrian.hunter@...el.com>,
 Andrew Morton <akpm@...ux-foundation.org>,
 Muchun Song <muchun.song@...ux.dev>
Cc: linux-arm-kernel@...ts.infradead.org, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v1 0/4] Reduce cost of ptep_get_lockless on arm64

>>>>
>>>> Likely, we just want to read "the real deal" on both sides of the pte_same()
>>>> handling.
>>>
>>> Sorry I'm not sure I understand? You mean read the full pte including
>>> access/dirty? That's the same as dropping the patch, right? Of course if we do
>>> that, we still have to keep pte_get_lockless() around for this case. In an ideal
>>> world we would convert everything over to ptep_get_lockless_norecency() and
>>> delete ptep_get_lockless() to remove the ugliness from arm64.
>>
>> Yes, agreed. Patch #3 does not look too crazy and it wouldn't really affect any
>> architecture.
>>
>> I do wonder if pte_same_norecency() should be defined per architecture and the
>> default would be pte_same(). So we could avoid the mkold etc on all other
>> architectures.
> 
> Wouldn't that break it's semantics? The "norecency" of
> ptep_get_lockless_norecency() means "recency information in the returned pte may
> be incorrect". But the "norecency" of pte_same_norecency() means "ignore the
> access and dirty bits when you do the comparison".

My idea was that ptep_get_lockless_norecency() would return the actual 
result on these architectures. So e.g., on x86, there would be no actual 
change in generated code.

But yes, the documentation of these functions would have to be improved.

Now I wonder if ptep_get_lockless_norecency() should actively clear 
dirty/accessed bits to more easily find any actual issues where the bits 
still matter ...

> 
> I think you could only do the optimization you describe if you required that
> pte_same_norecency() would only be given values returned by
> ptep_get_lockless_norecency() (or ptep_get_norecency()). Even then, its not
> quite the same; if a page is accessed between gets one will return true and the
> other false.

Right.

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ