linux-kernel - Re: [RFC][PATCH] x86/mm: Sync all vmalloc mappings before text

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2026887875.77814.1588260015439.JavaMail.zimbra@efficios.com>
Date:   Thu, 30 Apr 2020 11:20:15 -0400 (EDT)
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Joerg Roedel <jroedel@...e.de>, rostedt <rostedt@...dmis.org>
Cc:     linux-kernel <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Borislav Petkov <bp@...en8.de>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Shile Zhang <shile.zhang@...ux.alibaba.com>,
        Andy Lutomirski <luto@...capital.net>,
        "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Tzvetomir Stoyanov <tz.stoyanov@...il.com>
Subject: Re: [RFC][PATCH] x86/mm: Sync all vmalloc mappings before
 text_poke()

----- On Apr 30, 2020, at 10:50 AM, Joerg Roedel jroedel@...e.de wrote:

> On Thu, Apr 30, 2020 at 04:11:20PM +0200, Joerg Roedel wrote:
>> The page-fault handler calls a tracing function which again ends up in
>> trace_event_ignore_this_pid(), where it faults again. From here on the CPU is in
>> a page-fault loop, which continues until the stack overflows (with
>> CONFIG_VMAP_STACK).
> 
> Did some more testing to find out what this issue has to do with
> 
>	763802b53a42 x86/mm: split vmalloc_sync_all()
> 
> Above commit removes a call to vmalloc_sync_all() from the vmalloc
> unmapping path, because that call caused severe performance regressions
> on some workloads and was not needed on x86-64 anyway.
> 
> But that call caused vmalloc_sync_all() to be called regularily on
> x86-64 machines, so that all page-tables were more likely to be in sync.
> 
> The call was introduced by commit
> 
>	3f8fd02b1bf1 mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()
> 
> to fix a correctness issue on x86-32 PAE systems, which also need
> unmappings of large pages in the vmalloc area to be synchronized.
> 
> This additional call to vmalloc_sync_all() did hide the problem. I
> verified it by reverting both of the above commits on v5.7-rc3 and
> testing on that kernel. The problem is reproducible there too, the box
> hangs hard.
> 
> So the underlying problem is that a vmalloc()'ed tracing buffer is used
> to trace the page-fault handler, so that it has no chance of faulting in
> the buffer address to poking_mm and maybe other PGDs.
> 
> The right fix is to call vmalloc_sync_mappings() right after allocating
> tracing or perf buffers via v[zm]alloc().

Either right after allocation, or right before making the vmalloc'd data
structure visible to the instrumentation. In the case of the pid filter,
that would be the rcu_assign_pointer() which publishes the new pid filter
table.

As long as vmalloc_sync_mappings() is performed somewhere *between* allocation
and publishing the pointer for instrumentation, it's fine.

I'll let Steven decide on which approach works best for him.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com