linux-kernel - RE: [PATCH v4 0/4] kvm/arm: New VMID allocator based on asid

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ab7dc77ee8804f1e958c18cc4dac4a65@huawei.com>
Date:   Wed, 19 Jan 2022 12:30:09 +0000
From:   Shameerali Kolothum Thodi <shameerali.kolothum.thodi@...wei.com>
To:     Catalin Marinas <catalin.marinas@....com>
CC:     "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        "kvmarm@...ts.cs.columbia.edu" <kvmarm@...ts.cs.columbia.edu>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "maz@...nel.org" <maz@...nel.org>,
        "will@...nel.org" <will@...nel.org>,
        "james.morse@....com" <james.morse@....com>,
        "julien.thierry.kdev@...il.com" <julien.thierry.kdev@...il.com>,
        "suzuki.poulose@....com" <suzuki.poulose@....com>,
        "jean-philippe@...aro.org" <jean-philippe@...aro.org>,
        "Alexandru.Elisei@....com" <Alexandru.Elisei@....com>,
        "qperret@...gle.com" <qperret@...gle.com>,
        Jonathan Cameron <jonathan.cameron@...wei.com>,
        Linuxarm <linuxarm@...wei.com>
Subject: RE: [PATCH v4 0/4] kvm/arm: New VMID allocator based on asid



> -----Original Message-----
> From: Catalin Marinas [mailto:catalin.marinas@....com]
> Sent: 19 January 2022 11:50
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@...wei.com>
> Cc: linux-arm-kernel@...ts.infradead.org; kvmarm@...ts.cs.columbia.edu;
> linux-kernel@...r.kernel.org; maz@...nel.org; will@...nel.org;
> james.morse@....com; julien.thierry.kdev@...il.com;
> suzuki.poulose@....com; jean-philippe@...aro.org;
> Alexandru.Elisei@....com; qperret@...gle.com; Jonathan Cameron
> <jonathan.cameron@...wei.com>; Linuxarm <linuxarm@...wei.com>
> Subject: Re: [PATCH v4 0/4] kvm/arm: New VMID allocator based on asid
> 
> On Wed, Jan 19, 2022 at 09:23:31AM +0000, Shameerali Kolothum Thodi
> wrote:
> > > On Mon, Nov 22, 2021 at 12:18:40PM +0000, Shameer Kolothum wrote:
> > > >  -TLA+ model. Modified the asidalloc model to incorporate the new
> > > >   VMID algo. The main differences are,
> > > >   -flush_tlb_all() instead of local_tlb_flush_all() on rollover.
> > > >   -Introduced INVALID_VMID and vCPU Sched Out logic.
> > > >   -No CnP (Removed UniqueASIDAllCPUs & UniqueASIDActiveTask
> invariants).
> > > >   -Removed  UniqueVMIDPerCPU invariant for now as it looks like
> > > >    because of the speculative fetching with flush_tlb_all() there
> > > >    is a small window where this gets triggered. If I change the
> > > >    logic back to local_flush_tlb_all(), UniqueVMIDPerCPU seems to
> > > >    be fine. With my limited knowledge on TLA+ model, it is not
> > > >    clear to me whether this is a problem with the above logic
> > > >    or the VMID model implementation. Really appreciate any help
> > > >    with the model.
> > > >    The initial VMID TLA+ model is here,
> > > >
> https://github.com/shamiali2008/kernel-tla/tree/private-vmidalloc-v1
> > >
> > > I only had a brief look at the TLA+ model and I don't understand why you
> > > have a separate 'shed_out' process. It would run in parallel with the
> > > 'sched' but AFAICT you can't really schedule a guest out while you are
> > > in the middle of scheduling it in. I'd rather use the same 'sched'
> > > process and either schedule in an inactive task or schedule out an
> > > active one for a given CPU.
> > >
> > > Also active_vmids[] for example is defined on the CPUS domain but you
> > > call vcpu_sched_out() from a process that's not in the CPUS domain but
> > > the SCHED_OUT one.
> >
> > Many thanks for taking a look. My bad!. The 'sched_out' would indeed run in
> parallel
> > and defeat the purpose. I must say I was really confused by the TLA+ syntax
> and
> > is still not very confident about it.
> 
> Yeah, it can be confusing. If you have time, you could give CBMC a try
> and the 'spec' would be pretty close to your C version. Each CPU would
> be modelled as a thread with an extra thread that simulates the
> speculative TLB look-ups for all CPUS together with the asserts for the
> invariants. The spinlocks would be pthread_mutexes.

Ok, will take a look.

> > Based on the above suggestion, I have modified it as below,
> >
> > \* vCPU is scheduled out by KVM
> > macro vcpu_sched_out() {
> >         active_kvm[self].task := 0;
> >         active_vmids[self] := INVALID_VMID;
> > }
> 
> Could you call cpu_switch_kvm(0, INVALID_VMID) instead? You could do
> this directly below and avoid another macro. Well, whatever you find
> clearer.

Sure, will change that.

> What confuses me is that your INVALID_VMID looks like a valid one: vmid
> 0, generation 1. Do you ensure that you never allocate VMID 0?

This is same as in asidalloc wherein the cur_idx starts at 1 for any new
allocation. I think that guarantees VMID 0 is never allocated.  

> 
> > \* About to run a Guest VM
> > process (sched \in CPUS)
> > {
> > sched_loop:
> >         while (TRUE) {
> >                 with (t \in TASKS) {
> >                         if (t # ActiveTask(self))
> >                                 call kvm_arm_vmid_update(t);
> >                         else
> >                                 vcpu_sched_out();
> >                 }
> >         }
> > }
> 
> Yes, that's what I meant.

Ok.

> 
> > > The corresponding
> > > UniqueASIDPerCPU meant that for any two TLB entries on a single CPU, if
> > > they correspond to different tasks (pgd), they should have different
> > > ASIDs. That's a strong requirement, otherwise we end up with the wrong
> > > translation.
> >
> > Yes, I understand that it is a strong requirement. Also, I thought this is
> something
> > that will trigger easily with the test setup I had with the real hardware. But
> testing
> > stayed on for days, so I was not sure it is a problem with the TLA+
> implementation
> > or not.
> 
> Well, you'd have to check the TLA+ state trace and see how it got
> there, whether the last state would be a valid one. It's either
> something missing in the spec that the hardware enforces or the
> algorithm is wrong and just hard to hit in practice. If this condition
> is violated in hardware for a very brief period, e.g. due to some TLBI,
> you'd not notice an issue under normal circumstances. But it's still
> incorrect.

Hmm..I had a quick implementation with a separate thread for speculative
load as done in the arm64kpti.tla, but still get the UniqueVMIDPerCPU violation
error. I will go through the state trace and see whether I can interpret something :) 
 
> > > Why did you remove the CnP? Do we have this disabled for KVM guests?
> >
> > I removed CnP related Invariants to simplify things for the first version. Also
> not sure
> > what specific changes we need to do for CnP here. Do we still need that
> switching to
> > global pg_dir to prevent any speculative reading? I didn't see that being
> done in KVM
> > anywhere at the moment. Maybe I am missing something.
> 
> It make sense to ignore CnP for now. Maybe KVM doesn't even bother and
> sets VTTBR_EL2.CnP to 0 (I haven't checked).

I think KVM sets the CnP here,
https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/asm/kvm_mmu.h#L264

But I haven't figured out what else we need to do in this new VMID allocator for
CnP case. 

Marc, Will?

> 
> > On a side note, In my setup, the CnP=TRUE case for asidalloc.tla now fails
> with,
> > "Error: Invariant TLBEmptyInvalPTE is violated.". Please check.
> 
> This was added later as part of try-to-unmap and I only checked this
> with CnP = FALSE. I'll need to look into, it's possible that
> flush_tlb_all() doesn't take into account that the pte is unmapped (as
> cpu_switch_mm() does). I'll add a separate thread for speculative TLB
> loads, it's easier to have them in one place. Thanks.

Ok.

Thanks,
Shameer