linux-kernel - Re: [v2 PATCH] mm: avoid access flag update TLB flush for retried page fault

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20200717100820.GB8673@willie-the-truck>
Date:   Fri, 17 Jul 2020 11:08:21 +0100
From:   Will Deacon <will@...nel.org>
To:     Yang Shi <yang.shi@...ux.alibaba.com>
Cc:     hannes@...xchg.org, catalin.marinas@....com, will.deacon@....com,
        akpm@...ux-foundation.org, xuyu@...ux.alibaba.com,
        linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        linux-mm@...ck.org
Subject: Re: [v2 PATCH] mm: avoid access flag update TLB flush for retried
 page fault

On Thu, Jul 16, 2020 at 05:36:30AM +0800, Yang Shi wrote:
> Recently we found regression when running will_it_scale/page_fault3 test
> on ARM64.  Over 70% down for the multi processes cases and over 20% down
> for the multi threads cases.  It turns out the regression is caused by commit
> 89b15332af7c0312a41e50846819ca6613b58b4c ("mm: drop mmap_sem before
> calling balance_dirty_pages() in write fault").
> 
> The test mmaps a memory size file then write to the mapping, this would
> make all memory dirty and trigger dirty pages throttle, that upstream
> commit would release mmap_sem then retry the page fault.  The retried
> page fault would see correct PTEs installed by the first try then update
> dirty bit and clear read-only bit and flush TLBs for ARM.  The regression is
> caused by the excessive TLB flush.  It is fine on x86 since x86 doesn't
> clear read-only bit so there is no need to flush TLB for this case.
> 
> The page fault would be retried due to:
> 1. Waiting for page readahead
> 2. Waiting for page swapped in
> 3. Waiting for dirty pages throttling
> 
> The first two cases don't have PTEs set up at all, so the retried page
> fault would install the PTEs, so they don't reach there.  But the #3
> case usually has PTEs installed, the retried page fault would reach the
> dirty bit and read-only bit update.  But it seems not necessary to
> modify those bits again for #3 since they should be already set by the
> first page fault try.
> 
> Of course the parallel page fault may set up PTEs, but we just need care
> about write fault.  If the parallel page fault setup a writable and dirty
> PTE then the retried fault doesn't need do anything extra.  If the
> parallel page fault setup a clean read-only PTE, the retried fault should
> just call do_wp_page() then return as the below code snippet shows:
> 
> if (vmf->flags & FAULT_FLAG_WRITE) {
>         if (!pte_write(entry))
>             return do_wp_page(vmf);
> }
> 
> With this fix the test result get back to normal.
> 
> Fixes: 89b15332af7c ("mm: drop mmap_sem before calling balance_dirty_pages() in write fault")
> Cc: Catalin Marinas <catalin.marinas@....com>
> Cc: Will Deacon <will.deacon@....com>
> Cc: Johannes Weiner <hannes@...xchg.org>
> Reported-by: Xu Yu <xuyu@...ux.alibaba.com>
> Debugged-by: Xu Yu <xuyu@...ux.alibaba.com>
> Tested-by: Xu Yu <xuyu@...ux.alibaba.com>
> Signed-off-by: Yang Shi <yang.shi@...ux.alibaba.com>
> ---
> v2: * Incorporated the comment from Will Deacon.
>     * Updated the commit log per the discussion.
> 
>  mm/memory.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 87ec87c..e93e1da 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4241,8 +4241,14 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
>  	if (vmf->flags & FAULT_FLAG_WRITE) {
>  		if (!pte_write(entry))
>  			return do_wp_page(vmf);
> -		entry = pte_mkdirty(entry);
>  	}
> +
> +	if (vmf->flags & FAULT_FLAG_TRIED)
> +		goto unlock;
> +
> +	if (vmf->flags & FAULT_FLAG_WRITE)
> +		entry = pte_mkdirty(entry);
> +


Thanks, this looks better to me.

Andrew -- please can you update the version in your tree?

Cheers,

Will