[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201001181119.GB89689@C02TD0UTHF1T.local>
Date: Thu, 1 Oct 2020 19:11:19 +0100
From: Mark Rutland <mark.rutland@....com>
To: Alexander Potapenko <glider@...gle.com>
Cc: Marco Elver <elver@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
"H. Peter Anvin" <hpa@...or.com>,
"Paul E. McKenney" <paulmck@...nel.org>,
Andrey Konovalov <andreyknvl@...gle.com>,
Andrey Ryabinin <aryabinin@...tuozzo.com>,
Andy Lutomirski <luto@...nel.org>,
Borislav Petkov <bp@...en8.de>,
Catalin Marinas <catalin.marinas@....com>,
Christoph Lameter <cl@...ux.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
David Rientjes <rientjes@...gle.com>,
Dmitriy Vyukov <dvyukov@...gle.com>,
Eric Dumazet <edumazet@...gle.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Hillf Danton <hdanton@...a.com>,
Ingo Molnar <mingo@...hat.com>, Jann Horn <jannh@...gle.com>,
Jonathan.Cameron@...wei.com, Jonathan Corbet <corbet@....net>,
Joonsoo Kim <iamjoonsoo.kim@....com>,
Kees Cook <keescook@...omium.org>,
Pekka Enberg <penberg@...nel.org>,
Peter Zijlstra <peterz@...radead.org>, sjpark@...zon.com,
Thomas Gleixner <tglx@...utronix.de>,
Vlastimil Babka <vbabka@...e.cz>,
Will Deacon <will@...nel.org>,
the arch/x86 maintainers <x86@...nel.org>,
"open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
kasan-dev <kasan-dev@...glegroups.com>,
Linux ARM <linux-arm-kernel@...ts.infradead.org>,
Linux Memory Management List <linux-mm@...ck.org>
Subject: Re: [PATCH v3 01/10] mm: add Kernel Electric-Fence infrastructure
On Tue, Sep 29, 2020 at 05:51:58PM +0200, Alexander Potapenko wrote:
> On Tue, Sep 29, 2020 at 4:24 PM Mark Rutland <mark.rutland@....com> wrote:
> >
> > On Mon, Sep 21, 2020 at 03:26:02PM +0200, Marco Elver wrote:
> > > From: Alexander Potapenko <glider@...gle.com>
> > >
> > > This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
> > > low-overhead sampling-based memory safety error detector of heap
> > > use-after-free, invalid-free, and out-of-bounds access errors.
> > >
> > > KFENCE is designed to be enabled in production kernels, and has near
> > > zero performance overhead. Compared to KASAN, KFENCE trades performance
> > > for precision. The main motivation behind KFENCE's design, is that with
> > > enough total uptime KFENCE will detect bugs in code paths not typically
> > > exercised by non-production test workloads. One way to quickly achieve a
> > > large enough total uptime is when the tool is deployed across a large
> > > fleet of machines.
> > >
> > > KFENCE objects each reside on a dedicated page, at either the left or
> > > right page boundaries. The pages to the left and right of the object
> > > page are "guard pages", whose attributes are changed to a protected
> > > state, and cause page faults on any attempted access to them. Such page
> > > faults are then intercepted by KFENCE, which handles the fault
> > > gracefully by reporting a memory access error. To detect out-of-bounds
> > > writes to memory within the object's page itself, KFENCE also uses
> > > pattern-based redzones. The following figure illustrates the page
> > > layout:
> > >
> > > ---+-----------+-----------+-----------+-----------+-----------+---
> > > | xxxxxxxxx | O : | xxxxxxxxx | : O | xxxxxxxxx |
> > > | xxxxxxxxx | B : | xxxxxxxxx | : B | xxxxxxxxx |
> > > | x GUARD x | J : RED- | x GUARD x | RED- : J | x GUARD x |
> > > | xxxxxxxxx | E : ZONE | xxxxxxxxx | ZONE : E | xxxxxxxxx |
> > > | xxxxxxxxx | C : | xxxxxxxxx | : C | xxxxxxxxx |
> > > | xxxxxxxxx | T : | xxxxxxxxx | : T | xxxxxxxxx |
> > > ---+-----------+-----------+-----------+-----------+-----------+---
> > >
> > > Guarded allocations are set up based on a sample interval (can be set
> > > via kfence.sample_interval). After expiration of the sample interval, a
> > > guarded allocation from the KFENCE object pool is returned to the main
> > > allocator (SLAB or SLUB). At this point, the timer is reset, and the
> > > next allocation is set up after the expiration of the interval.
> >
> > From other sub-threads it sounds like these addresses are not part of
> > the linear/direct map.
> For x86 these addresses belong to .bss, i.e. "kernel text mapping"
> section, isn't that the linear map?
No; the "linear map" is the "direct mapping" on x86, and the "image" or
"kernel text mapping" is a distinct VA region. The image mapping aliases
(i.e. uses the same physical pages as) a portion of the linear map, and
every page in the linear map has a struct page.
Fon the x86_64 ivirtual memory layout, see:
https://www.kernel.org/doc/html/latest/x86/x86_64/mm.html
Originally, the kernel image lived in the linear map, but it was split
out into a distinct VA range (among other things) to permit KASLR. When
that split was made, the x86 virt_to_*() helpers were updated to detect
when they were passed a kernel image address, and automatically fix that
up as-if they'd been handed the linear map alias of that address.
For going one-way from virt->{phys,page} that works ok, but it doesn't
survive the round-trip, and introduces redundant work into each
virt_to_*() call.
As it was largely arch code that was using image addresses, we didn't
bother with the fixup on arm64, as we preferred the stronger warning. At
the time I was also under the impression that on x86 they wanted to get
rid of the automatic fixup, but that doesn't seem to have happened.
Thanks,
Mark.
Powered by blists - more mailing lists