[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <93009dbd-b31c-7364-86d2-21f0fac36676@jp.fujitsu.com>
Date: Fri, 1 Nov 2019 09:56:05 +0000
From: "qi.fuli@...itsu.com" <qi.fuli@...itsu.com>
To: Jonathan Corbet <corbet@....net>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will.deacon@....com>,
Itaru Kitayama <itaru.kitayama@...il.com>,
"peterz@...radead.org" <peterz@...radead.org>,
Jon Masters <jcm@...masters.org>
CC: "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>,
"qi.fuli@...itsu.com" <qi.fuli@...itsu.com>,
"indou.takao@...itsu.com" <indou.takao@...itsu.com>,
"maeda.naoaki@...itsu.com" <maeda.naoaki@...itsu.com>,
"misono.tomohiro@...itsu.com" <misono.tomohiro@...itsu.com>,
"tokamoto@...fujitsu.com" <tokamoto@...fujitsu.com>
Subject: Re: [PATCH 0/2] arm64: Introduce boot parameter to disable TLB flush
instruction within the same inner shareable domain
Hi,
First of all thanks for the comments for the patch.
I'm still struggling with this problem to find out the solution.
As a result of an investigation on this problem, after all, I think it
is necessary to improve TLB flush mechanism of the kernel to fix this
problem completely.
So, I'd like to restart a discussion. At first, I summarize this problem
to recall what was the problem and then I want to discuss how to fix it.
Summary of the problem:
A few months ago I proposed patches to solve a performance problem due
to TLB flush.[1]
A problem is that TLB flush on a core affects all other cores even if
all other cores do not need actual flush, and it causes performance
degradation.
In this thread, I explained that:
* I found a performance problem which is caused by TLBI-is instruction.
* The problem occurs like this:
1) On a core, OS tries to flush TLB using TLBI-is instruction
2) TLBI-is instruction causes a broadcast to all other cores, and
each core received hard-wired signal
3) Each core check if there are TLB entries which have the specified
ASID/VA
4) This check causes performance degradation
* We ran FWQ[2] and detected OS jitter due to this problem, this noise
is serious for HPC usage.
The noise means here a difference between maximum time and minimum time
which the same work takes.
How to fix:
I think the cause is TLB flush by TLBI-is because the instruction
affects cores that are not related to its flush.
So the previous patch I posted is
* Use mm_cpumask in mm_struct to find appropriate CPUs for TLB flush
* Exec TLBI instead of TLBI-is only to CPUs specified by mm_cpumask
(This is the same behavior as arm32 and x86)
And after the discussion about this patch, I got the following comments.
1) This patch switches the behavior (original flush by TLBI-is and new
flush by TLBI) by boot parameter, this implementation is not acceptable
due to bad maintainability.
2) Even if this patch fixes this problem, it may cause another
performance problem.
I'd like to start over the implementation by considering these points.
For the second comment above, I will run a benchmark test to analyze the
impact on performance.
Please let me know if there are other points I should take into
consideration.
[1] https://lkml.org/lkml/2019/6/17/703
[2] https://asc.llnl.gov/sequoia/benchmarks/FTQ_summary_v1.1.pdf
Thanks,
QI Fuli
On 6/17/19 11:32 PM, Takao Indoh wrote:
> From: Takao Indoh <indou.takao@...itsu.com>
>
> I found a performance issue related on the implementation of Linux's TLB
> flush for arm64.
>
> When I run a single-threaded test program on moderate environment, it
> usually takes 39ms to finish its work. However, when I put a small
> apprication, which just calls mprotest() continuously, on one of sibling
> cores and run it simultaneously, the test program slows down significantly.
> It becomes 49ms(125%) on ThunderX2. I also detected the same problem on
> ThunderX1 and Fujitsu A64FX.
>
> I suppose the root cause of this issue is the implementation of Linux's TLB
> flush for arm64, especially use of TLBI-is instruction which is a broadcast
> to all processor core on the system. In case of the above situation,
> TLBI-is is called by mprotect().
>
> This is not a problem for small environment, but this causes a significant
> performance noise for large-scale HPC environment, which has more than
> thousand nodes with low latency interconnect.
>
> To fix this problem, this patch adds new boot parameter
> 'disable_tlbflush_is'. In the case of flush_tlb_mm() *without* this
> parameter, TLB entry is invalidated by __tlbi(aside1is, asid). By this
> instruction, all CPUs within the same inner shareable domain check if there
> are TLB entries which have this ASID, this causes performance noise. OTOH,
> when this new parameter is specified, TLB entry is invalidated by
> __tlbi(aside1, asid) only on the CPUs specified by mm_cpumask(mm).
> Therefore TLB flush is done on minimal CPUs and performance problem does
> not occur. Actually I confirm the performance problem is fixed by this
> patch.
>
> Takao Indoh (2):
> arm64: mm: Restore mm_cpumask (revert commit 38d96287504a ("arm64: mm:
> kill mm_cpumask usage"))
> arm64: tlb: Add boot parameter to disable TLB flush within the same
> inner shareable domain
>
> .../admin-guide/kernel-parameters.txt | 4 +
> arch/arm64/include/asm/mmu_context.h | 7 +-
> arch/arm64/include/asm/tlbflush.h | 61 ++-----
> arch/arm64/kernel/Makefile | 2 +-
> arch/arm64/kernel/smp.c | 6 +
> arch/arm64/kernel/tlbflush.c | 155 ++++++++++++++++++
> arch/arm64/mm/context.c | 2 +
> 7 files changed, 186 insertions(+), 51 deletions(-)
> create mode 100644 arch/arm64/kernel/tlbflush.c
>
Powered by blists - more mailing lists