lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0ff27dbb-401b-4793-be70-1ff9c8010e4a@redhat.com>
Date: Mon, 3 Mar 2025 12:42:11 +0100
From: David Hildenbrand <david@...hat.com>
To: Mikołaj Lenczewski <miko.lenczewski@....com>
Cc: ryan.roberts@....com, suzuki.poulose@....com,
 yang@...amperecomputing.com, catalin.marinas@....com, will@...nel.org,
 joro@...tes.org, jean-philippe@...aro.org, mark.rutland@....com,
 joey.gouly@....com, oliver.upton@...ux.dev, james.morse@....com,
 broonie@...nel.org, maz@...nel.org, akpm@...ux-foundation.org, jgg@...pe.ca,
 nicolinc@...dia.com, mshavit@...gle.com, jsnitsel@...hat.com,
 smostafa@...gle.com, linux-arm-kernel@...ts.infradead.org,
 linux-kernel@...r.kernel.org, iommu@...ts.linux.dev
Subject: Re: [PATCH v2 3/4] arm64/mm: Elide tlbi in contpte_convert() under
 BBML2

On 03.03.25 11:55, Mikołaj Lenczewski wrote:
> On Mon, Mar 03, 2025 at 10:57:21AM +0100, David Hildenbrand wrote:
>> On 03.03.25 10:49, Mikołaj Lenczewski wrote:
>>> Hi David,
>>>
>>> Thanks for taking the time to review.
>>>
>>> On Mon, Mar 03, 2025 at 10:17:12AM +0100, David Hildenbrand wrote:
>>>> On 28.02.25 19:24, Mikołaj Lenczewski wrote:
>>>>> If we support bbml2 without conflict aborts, we can avoid the final
>>>>> flush and have hardware manage the tlb entries for us. Avoiding flushes
>>>>> is a win.
>>>>>
>>>>> Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@....com>
>>>>> Reviewed-by: Ryan Roberts <ryan.roberts@....com>
>>>>> ---
>>>>>     arch/arm64/mm/contpte.c | 3 ---
>>>>>     1 file changed, 3 deletions(-)
>>>>>
>>>>> diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c
>>>>> index 145530f706a9..77ed03b30b72 100644
>>>>> --- a/arch/arm64/mm/contpte.c
>>>>> +++ b/arch/arm64/mm/contpte.c
>>>>> @@ -72,9 +72,6 @@ static void contpte_convert(struct mm_struct *mm, unsigned long addr,
>>>>>     		__flush_tlb_range(&vma, start_addr, addr, PAGE_SIZE, true, 3);
>>>>>     	__set_ptes(mm, start_addr, start_ptep, pte, CONT_PTES);
>>>>> -
>>>>> -	if (system_supports_bbml2_noabort())
>>>>> -		__flush_tlb_range(&vma, start_addr, addr, PAGE_SIZE, true, 3);
>>>>>     }
>>>>>     void __contpte_try_fold(struct mm_struct *mm, unsigned long addr,
>>>>
>>>> What's the point of not squashing this into #2? :)
>>>>
>>>> If this split was requested during earlier review, at least seeing patch #2
>>>> on its own confused me.
>>>
>>> This split is a holdover from an earlier patchset, where it was still
>>> unknown whether the removal of the second flush was permitted with
>>> BBML2. Partly this was due to us being worried about conflict aborts
>>> after the removal, and partly this was because the "delay" is a separate
>>> optimisation that we could apply even if it turned out the final patch
>>> was not architecturally sound.
>>>
>>> Now that we do not handle conflict aborts (preferring only systems that
>>> handle BBML2 without ever raising aborts), the first issue is not a
>>> problem. The reasoning behind the second patch is also a little bit
>>> outdated, but I can see the logical split between a tlbi reorder, and
>>> the removal of the tlbi. If this is truly redundant though, I would be
>>> happy to squash the two into a single patch.
>>
>> Thanks for the information.
>>
>> Does patch #2 (reordering the tlbi) have any benefit on its own? I read
>> "other threads will not see an invalid pagetable entry", but I am not sure
>> that is correct. A concurrent HW page table walker would still find the
>> invalid PTE? It's just a matter of TLB state.
> 
> I think I understand what you mean. I agree that it is possible for a
> concurrent walk to see an invalid TLBI state, if it is on the same TLB
> that the repaint is happening on. For other TLBs, the flush has not yet
> propagated our invalidated PTEs (from `__ptep_get_and_clear()`) though?

What I am saying is: if there is no TLB entry yet, HW will walk the page 
table to find no present PTE and trigger a fault.

> That invalidation will only be seen by other TLBs after the
> `__flush_tlb_range()`, so we should save a few faults because only
> "local" threads will ever see the invalid entry, as opposed to all
> threads that try to read our modified range?

So what you say is, that deferring the flush means that if there is 
already a TLB entry, flushing deferred reduces the likelihood that a 
page table walk is triggered that could find no present PTE: 
consequently, reducing the likelihood that a page fault is triggered.

(I use the word likelihood, because I assume other action could result 
in a TLB entry getting flushed in the meantime, such as TLB entry reuse)

Correct?

Or it is the case that I
> have misunderstood something basic here, or that I have misinterpreted
> what you have written?

No, that makes it clearer, thanks.

-- 
Cheers,

David / dhildenb



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ