[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20180827170511.6bafa15cbc102ae135366e86@kernel.org>
Date: Mon, 27 Aug 2018 17:05:11 +0900
From: Masami Hiramatsu <mhiramat@...nel.org>
To: Nadav Amit <nadav.amit@...il.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Andy Lutomirski <luto@...nel.org>,
Kees Cook <keescook@...omium.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Paolo Bonzini <pbonzini@...hat.com>,
Jiri Kosina <jkosina@...e.cz>,
Will Deacon <will.deacon@....com>,
Benjamin Herrenschmidt <benh@....ibm.com>,
Nick Piggin <npiggin@...il.com>,
the arch/x86 maintainers <x86@...nel.org>,
Borislav Petkov <bp@...en8.de>,
Rik van Riel <riel@...riel.com>,
Jann Horn <jannh@...gle.com>,
Adin Scannell <ascannell@...gle.com>,
Dave Hansen <dave.hansen@...el.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linux-mm <linux-mm@...ck.org>,
David Miller <davem@...emloft.net>,
Martin Schwidefsky <schwidefsky@...ibm.com>,
Michael Ellerman <mpe@...erman.id.au>
Subject: Re: TLB flushes on fixmap changes
On Sun, 26 Aug 2018 20:26:09 -0700
Nadav Amit <nadav.amit@...il.com> wrote:
> at 8:03 PM, Masami Hiramatsu <mhiramat@...nel.org> wrote:
>
> > On Sun, 26 Aug 2018 11:09:58 +0200
> > Peter Zijlstra <peterz@...radead.org> wrote:
> >
> >> On Sat, Aug 25, 2018 at 09:21:22PM -0700, Andy Lutomirski wrote:
> >>> I just re-read text_poke(). It's, um, horrible. Not only is the
> >>> implementation overcomplicated and probably buggy, but it's SLOOOOOW.
> >>> It's totally the wrong API -- poking one instruction at a time
> >>> basically can't be efficient on x86. The API should either poke lots
> >>> of instructions at once or should be text_poke_begin(); ...;
> >>> text_poke_end();.
> >>
> >> I don't think anybody ever cared about performance here. Only
> >> correctness. That whole text_poke_bp() thing is entirely tricky.
> >
> > Agreed. Self modification is a special event.
> >
> >> FWIW, before text_poke_bp(), text_poke() would only be used from
> >> stop_machine, so all the other CPUs would be stuck busy-waiting with
> >> IRQs disabled. These days, yeah, that's lots more dodgy, but yes
> >> text_mutex should be serializing all that.
> >
> > I'm still not sure that speculative page-table walk can be done
> > over the mutex. Also, if the fixmap area is for aliasing
> > pages (which always mapped to memory), what kind of
> > security issue can happen?
>
> The PTE is accessible from other cores, so just as we assume for L1TF that
> the every addressable memory might be cached in L1, we should assume and
> PTE might be cached in the TLB when it is present.
Ok, so other cores can accidentally cache the PTE in TLB, (and no way
to shoot down explicitly?)
> Although the mapping is for an alias, there are a couple of issues here.
> First, this alias mapping is writable, so it might an attacker to change the
> kernel code (following another initial attack).
Combined with some buffer overflow, correct? If the attacker already can
write a kernel data directly, he is in the kernel mode.
> Second, the alias mapping is
> never explicitly flushed. We may assume that once the original mapping is
> removed/changed, a full TLB flush would take place, but there is no
> guarantee it actually takes place.
Hmm, would this means a full TLB flush will not flush alias mapping?
(or, the full TLB flush just doesn't work?)
> > Anyway, from the viewpoint of kprobes, either per-cpu fixmap or
> > changing CR3 sounds good to me. I think we don't even need per-cpu,
> > it can call a thread/function on a dedicated core (like the first
> > boot processor) and wait :) This may prevent leakage of pte change
> > to other cores.
>
> I implemented per-cpu fixmap, but I think that it makes more sense to take
> peterz approach and set an entry in the PGD level. Per-CPU fixmap either
> requires to pre-populate various levels in the page-table hierarchy, or
> conditionally synchronize whenever module memory is allocated, since they
> can share the same PGD, PUD & PMD. While usually the synchronization is not
> needed, the possibility that synchronization is needed complicates locking.
>
Could you point which PeterZ approach you said? I guess it will be
make a clone of PGD and use it for local page mapping (as new mm).
If so, yes it sounds perfectly fine to me.
> Anyhow, having fixed addresses for the fixmap can be used to circumvent
> KASLR.
I think text_poke doesn't mind using random address :)
> I don’t think a dedicated core is needed. Anyhow there is a lock
> (text_mutex), so use_mm() can be used after acquiring the mutex.
Hmm, use_mm() said;
/*
* use_mm
* Makes the calling kernel thread take on the specified
* mm context.
* (Note: this routine is intended to be called only
* from a kernel thread context)
*/
So maybe we need a dedicated kernel thread for safeness?
Thank you,
--
Masami Hiramatsu <mhiramat@...nel.org>
Powered by blists - more mailing lists