lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Thu, 1 Sep 2022 17:44:03 -0600
From:   Yu Zhao <yuzhao@...gle.com>
To:     Andrew Morton <akpm@...ux-foundation.org>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>
Cc:     Alexander Potapenko <glider@...gle.com>,
        Marco Elver <elver@...gle.com>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Alexei Starovoitov <ast@...nel.org>,
        Andrey Konovalov <andreyknvl@...gle.com>,
        Andy Lutomirski <luto@...nel.org>,
        Arnd Bergmann <arnd@...db.de>, Borislav Petkov <bp@...en8.de>,
        Christoph Hellwig <hch@....de>,
        Christoph Lameter <cl@...ux.com>,
        David Rientjes <rientjes@...gle.com>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        Eric Dumazet <edumazet@...gle.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Herbert Xu <herbert@...dor.apana.org.au>,
        Ilya Leoshkevich <iii@...ux.ibm.com>,
        Jens Axboe <axboe@...nel.dk>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Kees Cook <keescook@...omium.org>,
        Mark Rutland <mark.rutland@....com>,
        Matthew Wilcox <willy@...radead.org>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        Pekka Enberg <penberg@...nel.org>,
        Petr Mladek <pmladek@...e.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Vasily Gorbik <gor@...ux.ibm.com>,
        Vegard Nossum <vegard.nossum@...cle.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        kasan-dev <kasan-dev@...glegroups.com>,
        Linux Memory Management List <linux-mm@...ck.org>,
        Linux-Arch <linux-arch@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v5 04/44] x86: asm: instrument usercopy in get_user() and put_user()

On Tue, Aug 30, 2022 at 4:05 PM Andrew Morton <akpm@...ux-foundation.org> wrote:
...
> Yu, that inclusion is regrettable.  I don't think mm_types.h is an
> appropriate site for implementing lru_gen_use_mm() anyway.  Adding a
> new header is always the right fix for these things.  I'd suggest
> adding a new mglru.h (or whatever) and putting most/all of the mglru
> material in there.
>
> Also, the addition to kernel/sched/core.c wasn't clearly changelogged,
> is uncommented and I doubt if the sched developers know about it, let
> alone reviewed it.  Please give them a heads-up.

Adding Ingo, Peter, Juri and Vincent.

I added lru_gen_use_mm() (one store operation) to context_switch() in
kernel/sched/core.c, and I would appreciate it if you could take a
look and let me know if you have any concerns:
https://lore.kernel.org/r/20220815071332.627393-9-yuzhao@google.com/

I'll resend the series in a week or so, and cc you when that happens.

> The addition looks fairly benign, but core context_switch() is the
> sort of thing which people get rather defensive about and putting
> mm-specific stuff in there might be challenged.  Some quantitative
> justification of this optimization would be appropriate.

The commit message (from the above link) touches on the theory only:

    This patch uses the following optimizations when walking page tables:
    1. It tracks the usage of mm_struct's between context switches so that
       page table walkers can skip processes that have been sleeping since
       the last iteration.

Let me expand on this.

TLDR: lru_gen_use_mm() introduces an extra store operation whenever
switching to a new mm_struct, which sets a flag for page reclaim to
clear.

For systems that are NOT under memory pressure:
1. This is a new overhead.
2. I don't think it's measurable, hence can't be the last straw.
3. Assume it can be measured, the belief is that underutilized systems
should be sacrificed (to some degree) for the greater good.

For systems that are under memory pressure:
1. When this flag is set on a mm_struct, page reclaim knows that this
mm_struct has been used since the last time it cleared this flag. So
it's worth checking out this mm_struct (to clear the accessed bit).
2. The similar idea has been used on Android and ChromeOS: when an app
or a tab goes to the background, these systems (conditionally) call
MADV_COLD. The majority of GUI applications don't implement this idea.
MGLRU opts to do it for the benefit of them. How it benefits server
applications is unknown (uninteresting).
3. This optimization benefits arm64 v8.2+ more than x86, since x86
supports the accessed bit in non-leaf entries and therefore the search
space can be reduced based on that. On a 4GB ARM system with 40 Chrome
tabs opened and 5 tabs in active use, this optimization improves page
table walk performance by about 5%. The overall benefit is small but
measurable under heavy memory pressure.
4. The idea can be reused by other MM components, e.g., khugepaged.

Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ