linux-kernel - Re: [PATCH 0/2] arm64: Introduce boot parameter to disable TLB flush instruction within the same inner shareable domain

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190627102724.vif6zh6zfqktpmjx@willie-the-truck>
Date:   Thu, 27 Jun 2019 11:27:25 +0100
From:   Will Deacon <will@...nel.org>
To:     "qi.fuli@...itsu.com" <qi.fuli@...itsu.com>
Cc:     Will Deacon <will.deacon@....com>,
        "indou.takao@...itsu.com" <indou.takao@...itsu.com>,
        "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
        "peterz@...radead.org" <peterz@...radead.org>,
        Catalin Marinas <catalin.marinas@....com>,
        Jonathan Corbet <corbet@....net>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>
Subject: Re: [PATCH 0/2] arm64: Introduce boot parameter to disable TLB flush
 instruction within the same inner shareable domain

On Mon, Jun 24, 2019 at 10:34:02AM +0000, qi.fuli@...itsu.com wrote:
> On 6/18/19 2:03 AM, Will Deacon wrote:
> > On Mon, Jun 17, 2019 at 11:32:53PM +0900, Takao Indoh wrote:
> >> From: Takao Indoh <indou.takao@...itsu.com>
> >>
> >> I found a performance issue related on the implementation of Linux's TLB
> >> flush for arm64.
> >>
> >> When I run a single-threaded test program on moderate environment, it
> >> usually takes 39ms to finish its work. However, when I put a small
> >> apprication, which just calls mprotest() continuously, on one of sibling
> >> cores and run it simultaneously, the test program slows down significantly.
> >> It becomes 49ms(125%) on ThunderX2. I also detected the same problem on
> >> ThunderX1 and Fujitsu A64FX.
> > This is a problem for any applications that share hardware resources with
> > each other, so I don't think it's something we should be too concerned about
> > addressing unless there is a practical DoS scenario, which there doesn't
> > appear to be in this case. It may be that the real answer is "don't call
> > mprotect() in a loop".
> I think there has been a misunderstanding, please let me explain.
> This application is just an example using for reproducing the 
> performance issue we found.
> Our original purpose is reducing OS jitter by this series.
> The OS jitter on massively parallel processing systems have been known 
> and studied for many years.
> The 2.5% OS jitter can result in over a factor of 20 slowdown for the 
> same application [1].

I think it's worth pointing out that the system in question was neither
ARM-based nor running Linux, so I'd be cautious in applying the conclusions
of that paper directly to our TLB invalidation code. Furthermore, the noise
being generated in their experiments uses a timer interrupt, which has a
/vastly/ different profile to a DVM message in terms of both system impact
and frequency.

> Though it may be an extreme example, reducing the OS jitter has been an 
> issue in HPC environment.
> 
> [1] Ferreira, Kurt B., Patrick Bridges, and Ron Brightwell. 
> "Characterizing application sensitivity to OS interference using 
> kernel-level noise injection." Proceedings of the 2008 ACM/IEEE 
> conference on Supercomputing. IEEE Press, 2008.
> 
> >> I suppose the root cause of this issue is the implementation of Linux's TLB
> >> flush for arm64, especially use of TLBI-is instruction which is a broadcast
> >> to all processor core on the system. In case of the above situation,
> >> TLBI-is is called by mprotect().
> > On the flip side, Linux is providing the hardware with enough information
> > not to broadcast to cores for which the remote TLBs don't have entries
> > allocated for the ASID being invalidated. I would say that the root cause
> > of the issue is that this filtering is not taking place.
> 
> Do you mean that the filter should be implemented in hardware?

Yes. If you're building a large system and you care about "jitter", then
you either need to partition it in such a way that sources of noise are
contained, or you need to introduce filters to limit their scope. Rewriting
the low-level memory-management parts of the operating system is a red
herring and imposes a needless burden on everybody else without solving
the real problem, which is that contended use of shared resources doesn't
scale.

Will