lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 26 Mar 2024 17:34:43 +0100
From: David Hildenbrand <david@...hat.com>
To: Ryan Roberts <ryan.roberts@....com>, Mark Rutland <mark.rutland@....com>,
 Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>,
 Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
 Jiri Olsa <jolsa@...nel.org>, Ian Rogers <irogers@...gle.com>,
 Adrian Hunter <adrian.hunter@...el.com>,
 Andrew Morton <akpm@...ux-foundation.org>,
 Muchun Song <muchun.song@...ux.dev>
Cc: linux-arm-kernel@...ts.infradead.org, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v1 0/4] Reduce cost of ptep_get_lockless on arm64

On 26.03.24 17:31, Ryan Roberts wrote:
> On 26/03/2024 16:17, David Hildenbrand wrote:
>> On 15.02.24 13:17, Ryan Roberts wrote:
>>> This is an RFC for a series that aims to reduce the cost and complexity of
>>> ptep_get_lockless() for arm64 when supporting transparent contpte mappings [1].
>>> The approach came from discussion with Mark and David [2].
>>>
>>> It introduces a new helper, ptep_get_lockless_norecency(), which allows the
>>> access and dirty bits in the returned pte to be incorrect. This relaxation
>>> permits arm64's implementation to just read the single target pte, and avoids
>>> having to iterate over the full contpte block to gather the access and dirty
>>> bits, for the contpte case.
>>>
>>> It turns out that none of the call sites using ptep_get_lockless() require
>>> accurate access and dirty bit information, so we can also convert those sites.
>>> Although a couple of places need care (see patches 2 and 3).
>>>
>>> Arguably patch 3 is a bit fragile, given the wide accessibility of
>>> vmf->orig_pte. So it might make sense to drop this patch and stick to using
>>> ptep_get_lockless() in the page fault path. I'm keen to hear opinions.
>>
>> Yes. Especially as we have these pte_same() checks that might just fail now
>> because of wrong accessed/dirty bits?
> 
> Which pte_same() checks are you referring to? I've changed them all to
> pte_same_norecency() which ignores the access/dirty bits when doing the comparison.

I'm reading the patches just now. So I stumbled over that just after I 
wrote that, so I was missing that part from the description here.

> 
>>
>> Likely, we just want to read "the real deal" on both sides of the pte_same()
>> handling.
> 
> Sorry I'm not sure I understand? You mean read the full pte including
> access/dirty? That's the same as dropping the patch, right? Of course if we do
> that, we still have to keep pte_get_lockless() around for this case. In an ideal
> world we would convert everything over to ptep_get_lockless_norecency() and
> delete ptep_get_lockless() to remove the ugliness from arm64.

Yes, agreed. Patch #3 does not look too crazy and it wouldn't really 
affect any architecture.

I do wonder if pte_same_norecency() should be defined per architecture 
and the default would be pte_same(). So we could avoid the mkold etc on 
all other architectures.

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ