lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <188AEA79-10E6-4DFF-86F4-FE624FD1880F@vmware.com>
Date:   Fri, 21 Jul 2023 18:15:57 +0000
From:   Nadav Amit <namit@...are.com>
To:     Valentin Schneider <vschneid@...hat.com>
CC:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "linux-trace-kernel@...r.kernel.org" 
        <linux-trace-kernel@...r.kernel.org>,
        "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        linux-mm <linux-mm@...ck.org>, bpf <bpf@...r.kernel.org>,
        the arch/x86 maintainers <x86@...nel.org>,
        "rcu@...r.kernel.org" <rcu@...r.kernel.org>,
        "linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Jonathan Corbet <corbet@....net>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Andy Lutomirski <luto@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Frederic Weisbecker <frederic@...nel.org>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Neeraj Upadhyay <quic_neeraju@...cinc.com>,
        Joel Fernandes <joel@...lfernandes.org>,
        Josh Triplett <josh@...htriplett.org>,
        Boqun Feng <boqun.feng@...il.com>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Zqiang <qiang.zhang1211@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Uladzislau Rezki <urezki@...il.com>,
        Christoph Hellwig <hch@...radead.org>,
        Lorenzo Stoakes <lstoakes@...il.com>,
        Josh Poimboeuf <jpoimboe@...nel.org>,
        Jason Baron <jbaron@...mai.com>,
        Kees Cook <keescook@...omium.org>,
        Sami Tolvanen <samitolvanen@...gle.com>,
        Ard Biesheuvel <ardb@...nel.org>,
        Nicholas Piggin <npiggin@...il.com>,
        Juerg Haefliger <juerg.haefliger@...onical.com>,
        Nicolas Saenz Julienne <nsaenz@...nel.org>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Dan Carpenter <error27@...il.com>,
        Chuang Wang <nashuiliang@...il.com>,
        Yang Jihong <yangjihong1@...wei.com>,
        Petr Mladek <pmladek@...e.com>,
        "Jason A. Donenfeld" <Jason@...c4.com>, Song Liu <song@...nel.org>,
        Julian Pidancet <julian.pidancet@...cle.com>,
        Tom Lendacky <thomas.lendacky@....com>,
        Dionna Glaze <dionnaglaze@...gle.com>,
        Thomas Weißschuh <linux@...ssschuh.net>,
        Juri Lelli <juri.lelli@...hat.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Marcelo Tosatti <mtosatti@...hat.com>,
        Yair Podemsky <ypodemsk@...hat.com>
Subject: Re: [RFC PATCH v2 20/20] x86/mm, mm/vmalloc: Defer
 flush_tlb_kernel_range() targeting NOHZ_FULL CPUs



> On Jul 20, 2023, at 9:30 AM, Valentin Schneider <vschneid@...hat.com> wrote:
> 
> vunmap()'s issued from housekeeping CPUs are a relatively common source of
> interference for isolated NOHZ_FULL CPUs, as they are hit by the
> flush_tlb_kernel_range() IPIs.
> 
> Given that CPUs executing in userspace do not access data in the vmalloc
> range, these IPIs could be deferred until their next kernel entry.

So I think there are a few assumptions here that it seems suitable to confirm
and acknowledge the major one in the commit log (assuming they hold).

There is an assumption that VMAP page-tables are not freed. I actually
never paid attention to that, but skimming the code it does seem so. To
clarify the issue: if page-tables were freed and their pages were reused,
there would be a problem that page-walk caches for instance would be used
and “junk” entries from the reused pages would be used. See [1].

I would also assume the memory-hot-unplug of some sorts is not an issue,
(i.e., you cannot have a stale TLB entry pointing to memory that was
unplugged).  

I also think that there might be speculative code execution using stale
TLB entries that would point to memory that has been reused and perhaps
controllable by the user. If somehow the CPU/OS is tricked to use the
stale executable TLB entries early enough on kernel entry that might be
an issue. I guess it is probably theoretical issue, but it would be helpful
to confirm.

In general, deferring TLB flushes can be done safely. This patch, I think,
takes it one step forward and allows the reuse of the memory before the TLB
flush is actually done. This is more dangerous.

[1] https://lore.kernel.org/lkml/tip-b956575bed91ecfb136a8300742ecbbf451471ab@git.kernel.org/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ