lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b712bc81-4446-9be4-fd59-d08981d13475@amd.com>
Date:   Thu, 17 Nov 2022 14:56:47 -0600
From:   "Kalra, Ashish" <ashish.kalra@....com>
To:     Borislav Petkov <bp@...en8.de>
Cc:     x86@...nel.org, linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        linux-coco@...ts.linux.dev, linux-mm@...ck.org,
        linux-crypto@...r.kernel.org, tglx@...utronix.de, mingo@...hat.com,
        jroedel@...e.de, thomas.lendacky@....com, hpa@...or.com,
        ardb@...nel.org, pbonzini@...hat.com, seanjc@...gle.com,
        vkuznets@...hat.com, jmattson@...gle.com, luto@...nel.org,
        dave.hansen@...ux.intel.com, slp@...hat.com, pgonda@...gle.com,
        peterz@...radead.org, srinivas.pandruvada@...ux.intel.com,
        rientjes@...gle.com, dovmurik@...ux.ibm.com, tobin@....com,
        michael.roth@....com, vbabka@...e.cz, kirill@...temov.name,
        ak@...ux.intel.com, tony.luck@...el.com, marcorr@...gle.com,
        sathyanarayanan.kuppuswamy@...ux.intel.com, alpergun@...gle.com,
        dgilbert@...hat.com, jarkko@...nel.org
Subject: Re: [PATCH Part2 v6 14/49] crypto: ccp: Handle the legacy TMR
 allocation when SNP is enabled

Hello Boris,

>>
>>> +        if (ret)
>>> +            goto cleanup;
>>> +
>>> +        ret = rmp_make_shared(pfn, PG_LEVEL_4K);
>>> +        if (ret)
>>> +            goto cleanup;
>>> +
>>> +        pfn++;
>>> +        n++;
>>> +    }
>>> +
>>> +    return 0;
>>> +
>>> +cleanup:
>>> +    /*
>>> +     * If failed to reclaim the page then page is no longer safe to
>>> +     * be released, leak it.
>>> +     */
>>> +    snp_leak_pages(pfn, npages - n);
>>
>> So this looks real weird: we go and reclaim pages, we hit an error
>> during reclaiming a page X somewhere in-between and then we go and mark
>> the *remaining* pages as not to be used?!
>>
>> Why?
>>
>> Why not only that *one* page which failed and then we continue with the
>> rest?!
> 

I had a re-look at this while fixing the memory_failure() handling and 
realized that we can't do a *partial* recovery here if we fail to 
reclaim a single page, i.e., if we hit an error during reclaiming a 
single page we need to mark the remaining pages as not usable.

This is because this page could be a part of larger buffer which would 
have been transitioned to firmware state and need to be restored back in 
*full* to HV/shared state, any access to a partially transitioned buffer 
will still cause failures, basically the callers won't be able to do any 
kind of a recovery/access on a partially restored/reclaimed buffer and 
now potentially fragmented buffer, which anyway means failure due to 
data loss on non reclaimed page(s).

So we need to be able to reclaim all the pages or none.

Also this failure won't be happening regularly/frequently, it is a 
*rare* error case and if there is a reclamation failure on a single 
page, there is a high probability that there will be reclamation 
failures on subsequent pages.

Thanks,
Ashish

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ