linux-kernel - Re: [PATCH v11 018/113] KVM: TDX: create/destroy VM structure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230117214414.00003229@gmail.com>
Date:   Tue, 17 Jan 2023 21:44:14 +0200
From:   Zhi Wang <zhi.wang.linux@...il.com>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     isaku.yamahata@...el.com, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org, isaku.yamahata@...il.com,
        Paolo Bonzini <pbonzini@...hat.com>, erdemaktas@...gle.com,
        Sagi Shahar <sagis@...gle.com>,
        David Matlack <dmatlack@...gle.com>,
        Sean Christopherson <sean.j.christopherson@...el.com>,
        Kai Huang <kai.huang@...el.com>
Subject: Re: [PATCH v11 018/113] KVM: TDX: create/destroy VM structure

On Tue, 17 Jan 2023 15:55:53 +0000
Sean Christopherson <seanjc@...gle.com> wrote:

> On Sat, Jan 14, 2023, Zhi Wang wrote:
> > On Fri, 13 Jan 2023 15:16:08 +0000 > Sean Christopherson <seanjc@...gle.com> wrote:
> > 
> > > On Fri, Jan 13, 2023, Zhi Wang wrote:
> > > > Better add a FIXME: here as this has to be fixed later.
> > > 
> > > No, leaking the page is all KVM can reasonably do here.  An improved
> > > comment would be helpful, but no code change is required.
> > > tdx_reclaim_page() returns an error if and only if there's an
> > > unexpected, fatal error, e.g. a SEAMCALL with bad params, incorrect
> > > concurrency in KVM, a TDX Module bug, etc.  Retrying at a later point is
> > > highly unlikely to be successful.
> > 
> > Hi:
> > 
> > The word "leaking" sounds like a situation left unhandled temporarily.
> > 
> > I checked the source code of the TDX module[1] for the possible reason to
> > fail when reviewing this patch:
> > 
> > tdx-module-v1.0.01.01.zip\src\vmm_dispatcher\api_calls\tdh_phymem_page_reclaim.c
> > tdx-module-v1.0.01.01.zip\src\vmm_dispatcher\api_calls\tdh_phymem_page_wbinvd.c
> > 
> > a. Invalid parameters. For example, page is not aligned, PA HKID is not zero...
> > 
> > For invalid parameters, a WARN_ON_ONCE() + return value is good enough as
> > that is how kernel handles similar situations. The caller takes the
> > responsibility.
> >  
> > b. Locks has been taken in TDX module. TDR page has been locked due to another
> > SEAMCALL, another SEAMCALL is doing PAMT walk and holding PAMT lock... 
> > 
> > This needs to be improved later either by retry or taking tdx_lock to avoid
> > TDX module fails on this.
> 
> No, tdx_reclaim_page() already retries TDH.PHYMEM.PAGE.RECLAIM if the target page
> is contended (though I'd question the validity of even that), and TDH.PHYMEM.PAGE.WBINVD
> is performed only when reclaiming the TDR.  If there's contention when reclaiming
> the TDR, then KVM effectively has a use-after-free bug, i.e. leaking the page is
> the least of our worries.
> 

Hi:

Thanks for the reply. "Leaking" is the consquence of even failing in retry. I
agree with this. But I was questioning if "retry" is really a correct and only
solution when encountering lock contention in the TDX module as I saw that there
are quite some magic numbers are going to be introduced because of "retry" and
there were discussions about times of retry should be 3 or 1000 in TDX guest
on hyper-V patches. It doesn't sound right.

Compare to an typical *kernel lock* case, an execution path can wait on a
waitqueue and later will be woken up. We usually do contention-wait-and-retry
and we rarely just do contention and retry X times. In TDX case, I understand
that it is hard for the TDX module to provide similar solutions as an execution
path can't stay long in the TDX module.

1) We can always take tdx_lock (linux kernel lock) when calling a SEAMCALL
that touch the TDX internal locks. But the downside is we might lose some
concurrency.

2) As TDX module doesn't provide contention-and-wait, I guess the following
approach might have been discussed when designing this "retry".

KERNEL                          TDX MODULE

SEAMCALL A   ->                 PATH A: Taking locks

SEAMCALL B   ->                 PATH B: Contention on a lock

             <-                 Return "operand busy"

SEAMCALL B   -|
              |  <- Wait on a kernel waitqueue
SEAMCALL B  <-|

SEAMCALL A   <-                 PATH A: Return

SEAMCALL A   -|
              |  <- Wake up the waitqueue
SEMACALL A  <-| 

SEAMCALL B  ->                  PATH B: Taking the locks
...

Why not this scheme wasn't chosen?

> 
> On Thu, Jan 12, 2023 at 8:34 AM <isaku.yamahata@...el.com> wrote:
> > +static int tdx_reclaim_page(hpa_t pa, bool do_wb, u16 hkid)
> > +{
> > +       struct tdx_module_output out;
> > +       u64 err;
> > +
> > +       do {
> > +               err = tdh_phymem_page_reclaim(pa, &out);
> > +               /*
> > +                * TDH.PHYMEM.PAGE.RECLAIM is allowed only when TD is shutdown.
> > +                * state.  i.e. destructing TD.
> > +                * TDH.PHYMEM.PAGE.RECLAIM  requires TDR and target page.
> > +                * Because we're destructing TD, it's rare to contend with TDR.
> > +                */
> > +       } while (err == (TDX_OPERAND_BUSY | TDX_OPERAND_ID_RCX));
> > +       if (WARN_ON_ONCE(err)) {
> > +               pr_tdx_error(TDH_PHYMEM_PAGE_RECLAIM, err, &out);
> > +               return -EIO;
> > +       }
> > +
> > +       if (do_wb) {
> > +               /*
> > +                * Only TDR page gets into this path.  No contention is expected
> > +                * because of the last page of TD.
> > +                */
> > +               err = tdh_phymem_page_wbinvd(set_hkid_to_hpa(pa, hkid));
> > +               if (WARN_ON_ONCE(err)) {
> > +                       pr_tdx_error(TDH_PHYMEM_PAGE_WBINVD, err, NULL);
> > +                       return -EIO;
> > +               }
> > +       }
> > +
> > +       tdx_clear_page(pa);
> > +       return 0;
> > +}