[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c1218cdb-905b-4896-8e17-109700577cec@redhat.com>
Date: Tue, 26 Mar 2024 18:04:45 +0100
From: David Hildenbrand <david@...hat.com>
To: Ryan Roberts <ryan.roberts@....com>, Mark Rutland <mark.rutland@....com>,
Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>, Ian Rogers <irogers@...gle.com>,
Adrian Hunter <adrian.hunter@...el.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Muchun Song <muchun.song@...ux.dev>
Cc: linux-arm-kernel@...ts.infradead.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v1 0/4] Reduce cost of ptep_get_lockless on arm64
>>>>
>>>> Likely, we just want to read "the real deal" on both sides of the pte_same()
>>>> handling.
>>>
>>> Sorry I'm not sure I understand? You mean read the full pte including
>>> access/dirty? That's the same as dropping the patch, right? Of course if we do
>>> that, we still have to keep pte_get_lockless() around for this case. In an ideal
>>> world we would convert everything over to ptep_get_lockless_norecency() and
>>> delete ptep_get_lockless() to remove the ugliness from arm64.
>>
>> Yes, agreed. Patch #3 does not look too crazy and it wouldn't really affect any
>> architecture.
>>
>> I do wonder if pte_same_norecency() should be defined per architecture and the
>> default would be pte_same(). So we could avoid the mkold etc on all other
>> architectures.
>
> Wouldn't that break it's semantics? The "norecency" of
> ptep_get_lockless_norecency() means "recency information in the returned pte may
> be incorrect". But the "norecency" of pte_same_norecency() means "ignore the
> access and dirty bits when you do the comparison".
My idea was that ptep_get_lockless_norecency() would return the actual
result on these architectures. So e.g., on x86, there would be no actual
change in generated code.
But yes, the documentation of these functions would have to be improved.
Now I wonder if ptep_get_lockless_norecency() should actively clear
dirty/accessed bits to more easily find any actual issues where the bits
still matter ...
>
> I think you could only do the optimization you describe if you required that
> pte_same_norecency() would only be given values returned by
> ptep_get_lockless_norecency() (or ptep_get_norecency()). Even then, its not
> quite the same; if a page is accessed between gets one will return true and the
> other false.
Right.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists