lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrXtwksWniEjiWKgZWZAyYLDipuq+sQ449OvDKehJ3D-fg@mail.gmail.com>
Date:   Tue, 14 May 2019 08:23:48 -0700
From:   Andy Lutomirski <luto@...nel.org>
To:     Alexandre Chartre <alexandre.chartre@...cle.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Andy Lutomirski <luto@...nel.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Radim Krcmar <rkrcmar@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        kvm list <kvm@...r.kernel.org>, X86 ML <x86@...nel.org>,
        Linux-MM <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
        jan.setjeeilers@...cle.com, Liran Alon <liran.alon@...cle.com>,
        Jonathan Adams <jwadams@...gle.com>
Subject: Re: [RFC KVM 18/27] kvm/isolation: function to copy page table
 entries for percpu buffer

On Tue, May 14, 2019 at 2:42 AM Alexandre Chartre
<alexandre.chartre@...cle.com> wrote:
>
>
> On 5/14/19 10:34 AM, Andy Lutomirski wrote:
> >
> >
> >> On May 14, 2019, at 1:25 AM, Alexandre Chartre <alexandre.chartre@...cle.com> wrote:
> >>
> >>
> >>> On 5/14/19 9:09 AM, Peter Zijlstra wrote:
> >>>> On Mon, May 13, 2019 at 11:18:41AM -0700, Andy Lutomirski wrote:
> >>>> On Mon, May 13, 2019 at 7:39 AM Alexandre Chartre
> >>>> <alexandre.chartre@...cle.com> wrote:
> >>>>>
> >>>>> pcpu_base_addr is already mapped to the KVM address space, but this
> >>>>> represents the first percpu chunk. To access a per-cpu buffer not
> >>>>> allocated in the first chunk, add a function which maps all cpu
> >>>>> buffers corresponding to that per-cpu buffer.
> >>>>>
> >>>>> Also add function to clear page table entries for a percpu buffer.
> >>>>>
> >>>>
> >>>> This needs some kind of clarification so that readers can tell whether
> >>>> you're trying to map all percpu memory or just map a specific
> >>>> variable.  In either case, you're making a dubious assumption that
> >>>> percpu memory contains no secrets.
> >>> I'm thinking the per-cpu random pool is a secrit. IOW, it demonstrably
> >>> does contain secrits, invalidating that premise.
> >>
> >> The current code unconditionally maps the entire first percpu chunk
> >> (pcpu_base_addr). So it assumes it doesn't contain any secret. That is
> >> mainly a simplification for the POC because a lot of core information
> >> that we need, for example just to switch mm, are stored there (like
> >> cpu_tlbstate, current_task...).
> >
> > I don’t think you should need any of this.
> >
>
> At the moment, the current code does need it. Otherwise it can't switch from
> kvm mm to kernel mm: switch_mm_irqs_off() will fault accessing "cpu_tlbstate",
> and then the page fault handler will fail accessing "current" before calling
> the kvm page fault handler. So it will double fault or loop on page faults.
> There are many different places where percpu variables are used, and I have
> experienced many double fault/page fault loop because of that.

Now you're experiencing what working on the early PTI code was like :)

This is why I think you shouldn't touch current in any of this.

>
> >>
> >> If the entire first percpu chunk effectively has secret then we will
> >> need to individually map only buffers we need. The kvm_copy_percpu_mapping()
> >> function is added to copy mapping for a specified percpu buffer, so
> >> this used to map percpu buffers which are not in the first percpu chunk.
> >>
> >> Also note that mapping is constrained by PTE (4K), so mapped buffers
> >> (percpu or not) which do not fill a whole set of pages can leak adjacent
> >> data store on the same pages.
> >>
> >>
> >
> > I would take a different approach: figure out what you need and put it in its
> > own dedicated area, kind of like cpu_entry_area.
>
> That's certainly something we can do, like Julian proposed with "Process-local
> memory allocations": https://lkml.org/lkml/2018/11/22/1240
>
> That's fine for buffers allocated from KVM, however, we will still need some
> core kernel mappings so the thread can run and interrupts can be handled.
>
> > One nasty issue you’ll have is vmalloc: the kernel stack is in the
> > vmap range, and, if you allow access to vmap memory at all, you’ll
> > need some way to ensure that *unmap* gets propagated. I suspect the
> > right choice is to see if you can avoid using the kernel stack at all
> > in isolated mode.  Maybe you could run on the IRQ stack instead.
>
> I am currently just copying the task stack mapping into the KVM page table
> (patch 23) when a vcpu is created:
>
>         err = kvm_copy_ptes(tsk->stack, THREAD_SIZE);
>
> And this seems to work. I am clearing the mapping when the VM vcpu is freed,
> so I am making the assumption that the same task is used to create and free
> a vcpu.
>

vCPUs are bound to an mm but not a specific task, right?  So I think
this is wrong in both directions.

Suppose a vCPU is created, then the task exits, the stack mapping gets
freed (the core code tries to avoid this, but it does happen), and a
new stack gets allocated at the same VA with different physical pages.
Now you're toast :)  On the flip side, wouldn't you crash if a vCPU is
created and then run on a different thread?

How important is the ability to enable IRQs while running with the KVM
page tables?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ