[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrXi7YEk2KVSyXVuhUF_AYTqENx+XzYJqEhAYs=8oiSouA@mail.gmail.com>
Date: Wed, 2 Dec 2020 21:05:30 -0800
From: Andy Lutomirski <luto@...capital.net>
To: Nicholas Piggin <npiggin@...il.com>
Cc: Christian Borntraeger <borntraeger@...ibm.com>,
Catalin Marinas <catalin.marinas@....com>,
Dave Hansen <dave.hansen@...el.com>,
Vasily Gorbik <gor@...ux.ibm.com>,
Heiko Carstens <hca@...ux.ibm.com>,
Andy Lutomirski <luto@...nel.org>,
Will Deacon <will@...nel.org>,
Anton Blanchard <anton@...abs.org>,
Arnd Bergmann <arnd@...db.de>,
linux-arch <linux-arch@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
Linux-MM <linux-mm@...ck.org>,
linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Peter Zijlstra <peterz@...radead.org>, X86 ML <x86@...nel.org>
Subject: Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option
> On Dec 1, 2020, at 7:47 PM, Nicholas Piggin <npiggin@...il.com> wrote:
>
> Excerpts from Andy Lutomirski's message of December 1, 2020 4:31 am:
>> other arch folk: there's some background here:
>>
>> https://lkml.kernel.org/r/CALCETrVXUbe8LfNn-Qs+DzrOQaiw+sFUg1J047yByV31SaTOZw@mail.gmail.com
>>
>>> On Sun, Nov 29, 2020 at 12:16 PM Andy Lutomirski <luto@...nel.org> wrote:
>>>
>>> On Sat, Nov 28, 2020 at 7:54 PM Andy Lutomirski <luto@...nel.org> wrote:
>>>>
>>>> On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin <npiggin@...il.com> wrote:
>>>>>
>>>>> On big systems, the mm refcount can become highly contented when doing
>>>>> a lot of context switching with threaded applications (particularly
>>>>> switching between the idle thread and an application thread).
>>>>>
>>>>> Abandoning lazy tlb slows switching down quite a bit in the important
>>>>> user->idle->user cases, so so instead implement a non-refcounted scheme
>>>>> that causes __mmdrop() to IPI all CPUs in the mm_cpumask and shoot down
>>>>> any remaining lazy ones.
>>>>>
>>>>> Shootdown IPIs are some concern, but they have not been observed to be
>>>>> a big problem with this scheme (the powerpc implementation generated
>>>>> 314 additional interrupts on a 144 CPU system during a kernel compile).
>>>>> There are a number of strategies that could be employed to reduce IPIs
>>>>> if they turn out to be a problem for some workload.
>>>>
>>>> I'm still wondering whether we can do even better.
>>>>
>>>
>>> Hold on a sec.. __mmput() unmaps VMAs, frees pagetables, and flushes
>>> the TLB. On x86, this will shoot down all lazies as long as even a
>>> single pagetable was freed. (Or at least it will if we don't have a
>>> serious bug, but the code seems okay. We'll hit pmd_free_tlb, which
>>> sets tlb->freed_tables, which will trigger the IPI.) So, on
>>> architectures like x86, the shootdown approach should be free. The
>>> only way it ought to have any excess IPIs is if we have CPUs in
>>> mm_cpumask() that don't need IPI to free pagetables, which could
>>> happen on paravirt.
>>
>> Indeed, on x86, we do this:
>>
>> [ 11.558844] flush_tlb_mm_range.cold+0x18/0x1d
>> [ 11.559905] tlb_finish_mmu+0x10e/0x1a0
>> [ 11.561068] exit_mmap+0xc8/0x1a0
>> [ 11.561932] mmput+0x29/0xd0
>> [ 11.562688] do_exit+0x316/0xa90
>> [ 11.563588] do_group_exit+0x34/0xb0
>> [ 11.564476] __x64_sys_exit_group+0xf/0x10
>> [ 11.565512] do_syscall_64+0x34/0x50
>>
>> and we have info->freed_tables set.
>>
>> What are the architectures that have large systems like?
>>
>> x86: we already zap lazies, so it should cost basically nothing to do
>
> This is not zapping lazies, this is freeing the user page tables.
>
> "lazy mm" is where a switch to a kernel thread takes on the
> previous mm for its kernel mapping rather than switch to init_mm.
The intent of the code is to flush the TLB after freeing user pages
tables, but, on bare metal, lazies get zapped as a side effect.
Anyway, I'm going to send out a mockup of an alternative approach shortly.
Powered by blists - more mailing lists