[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <858ecac5-9ba7-48da-8f34-ffda28d17609@arm.com>
Date: Fri, 7 Feb 2025 11:05:36 +0530
From: Anshuman Khandual <anshuman.khandual@....com>
To: Ryan Roberts <ryan.roberts@....com>,
Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>,
Muchun Song <muchun.song@...ux.dev>,
Pasha Tatashin <pasha.tatashin@...een.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Uladzislau Rezki <urezki@...il.com>, Christoph Hellwig <hch@...radead.org>,
Mark Rutland <mark.rutland@....com>, Ard Biesheuvel <ardb@...nel.org>,
Dev Jain <dev.jain@....com>, Alexandre Ghiti <alexghiti@...osinc.com>,
Steve Capper <steve.capper@...aro.org>, Kevin Brodsky <kevin.brodsky@....com>
Cc: linux-arm-kernel@...ts.infradead.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1 08/16] arm64/mm: Hoist barriers out of ___set_ptes()
loop
On 2/5/25 20:39, Ryan Roberts wrote:
> ___set_ptes() previously called __set_pte() for each PTE in the range,
> which would conditionally issue a DSB and ISB to make the new PTE value
> immediately visible to the table walker if the new PTE was valid and for
> kernel space.
>
> We can do better than this; let's hoist those barriers out of the loop
> so that they are only issued once at the end of the loop. We then reduce
> the cost by the number of PTEs in the range.
>
> Signed-off-by: Ryan Roberts <ryan.roberts@....com>
> ---
> arch/arm64/include/asm/pgtable.h | 14 ++++++++++----
> 1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 3b55d9a15f05..1d428e9c0e5a 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -317,10 +317,8 @@ static inline void __set_pte_nosync(pte_t *ptep, pte_t pte)
> WRITE_ONCE(*ptep, pte);
> }
>
> -static inline void __set_pte(pte_t *ptep, pte_t pte)
> +static inline void __set_pte_complete(pte_t pte)
> {
> - __set_pte_nosync(ptep, pte);
> -
> /*
> * Only if the new pte is valid and kernel, otherwise TLB maintenance
> * or update_mmu_cache() have the necessary barriers.
> @@ -331,6 +329,12 @@ static inline void __set_pte(pte_t *ptep, pte_t pte)
> }
> }
>
> +static inline void __set_pte(pte_t *ptep, pte_t pte)
> +{
> + __set_pte_nosync(ptep, pte);
> + __set_pte_complete(pte);
> +}
> +
> static inline pte_t __ptep_get(pte_t *ptep)
> {
> return READ_ONCE(*ptep);
> @@ -647,12 +651,14 @@ static inline void ___set_ptes(struct mm_struct *mm, pte_t *ptep, pte_t pte,
>
> for (;;) {
> __check_safe_pte_update(mm, ptep, pte);
> - __set_pte(ptep, pte);
> + __set_pte_nosync(ptep, pte);
> if (--nr == 0)
> break;
> ptep++;
> pte = pte_advance_pfn(pte, stride);
> }
> +
> + __set_pte_complete(pte);
Given that the loop now iterates over number of page table entries without corresponding
consecutive dsb/isb sync, could there be a situation where something else gets scheduled
on the cpu before __set_pte_complete() is called ? Hence leaving the entire page table
entries block without desired mapping effect. IOW how __set_pte_complete() is ensured to
execute once the loop above completes. Otherwise this change LGTM.
> }
>
> static inline void __set_ptes(struct mm_struct *mm,
Powered by blists - more mailing lists