linux-kernel - Re: [RFC PATCH] x86/sev: Disallow userspace access to BIOS region for SEV-SNP guests

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e67bbcf2-02f4-4ce1-8ad0-b1f1381c65fb@suse.com>
Date: Thu, 10 Apr 2025 19:39:58 +0300
From: Nikolay Borisov <nik.borisov@...e.com>
To: Kees Cook <kees@...nel.org>
Cc: Dan Williams <dan.j.williams@...el.com>,
 Dave Hansen <dave.hansen@...el.com>, Tom Lendacky <thomas.lendacky@....com>,
 Naveen N Rao <naveen@...nel.org>, linux-kernel@...r.kernel.org,
 x86@...nel.org, linux-coco@...ts.linux.dev,
 Dave Hansen <dave.hansen@...ux.intel.com>, Borislav Petkov <bp@...en8.de>,
 Vishal Annapurve <vannapurve@...gle.com>,
 Kirill Shutemov <kirill.shutemov@...ux.intel.com>,
 Kevin Loughlin <kevinloughlin@...gle.com>, Michal Hocko <mhocko@...e.com>
Subject: Re: [RFC PATCH] x86/sev: Disallow userspace access to BIOS region for
 SEV-SNP guests



On 10.04.25 г. 19:32 ч., Kees Cook wrote:
> On Thu, Apr 10, 2025 at 03:03:55PM +0300, Nikolay Borisov wrote:
>>
>>
>> On 9.04.25 г. 21:39 ч., Dan Williams wrote:
>>> Kees Cook wrote:
>>>> On Tue, Apr 08, 2025 at 04:55:08PM -0700, Dan Williams wrote:
>>>>> Dave Hansen wrote:
>>>>>> On 4/8/25 06:43, Tom Lendacky wrote:
>>>>>>>> Tom/Boris, do you see a problem blocking access to /dev/mem for SEV
>>>>>>>> guests?
>>>>>>> Not sure why we would suddenly not allow that.
>>>>>>
>>>>>> Both TDX and SEV-SNP have issues with allowing access to /dev/mem.
>>>>>> Disallowing access to the individually troublesome regions can fix
>>>>>> _part_ of the problem. But suddenly blocking access is guaranteed to fix
>>>>>> *ALL* the problems forever.
>>>>>
>>>>> ...or at least solicits practical use cases for why the kernel needs to
>>>>> poke holes in the policy.
>>>>>
>>>>>> Or, maybe we just start returning 0's for all reads and throw away all
>>>>>> writes. That is probably less likely to break userspace that doesn't
>>>>>> know what it's doing in the first place.
>>>>>
>>>>> Yes, and a bulk of the regression risk has already been pipe-cleaned by
>>>>> KERNEL_LOCKDOWN that shuts down /dev/mem and PCI resource file mmap in
>>>>> many scenarios.
>>>>>
>>>>> Here is an updated patch that includes some consideration for mapping
>>>>> zeros for known legacy compatibility use cases.
>>> [..]
>>>>> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
>>>>> index bfa444a7dbb0..c8679ae1bc8b 100644
>>>>> --- a/arch/x86/mm/init.c
>>>>> +++ b/arch/x86/mm/init.c
>>>>> @@ -867,6 +867,8 @@ void __init poking_init(void)
>>>>>     */
>>>>>    int devmem_is_allowed(unsigned long pagenr)
>>>>>    {
>>>>> +	bool platform_allowed = x86_platform.devmem_is_allowed(pagenr);
>>>>> +
>>>>>    	if (region_intersects(PFN_PHYS(pagenr), PAGE_SIZE,
>>>>>    				IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE)
>>>>>    			!= REGION_DISJOINT) {
>>>>> @@ -885,14 +887,20 @@ int devmem_is_allowed(unsigned long pagenr)
>>>>>    	 * restricted resource under CONFIG_STRICT_DEVMEM.
>>>>>    	 */
>>>>>    	if (iomem_is_exclusive(pagenr << PAGE_SHIFT)) {
>>>>> -		/* Low 1MB bypasses iomem restrictions. */
>>>>> -		if (pagenr < 256)
>>>>> +		/*
>>>>> +		 * Low 1MB bypasses iomem restrictions unless the
>>>>> +		 * platform says "no", in which case map zeroes
>>>>> +		 */
>>>>> +		if (pagenr < 256) {
>>>>> +			if (!platform_allowed)
>>>>> +				return 2;
>>>>>    			return 1;
>>>>> +		}
>>>>>    		return 0;
>>>>>    	}
>>>>> -	return 1;
>>>>> +	return platform_allowed;
>>>>>    }
>>>>>    void free_init_pages(const char *what, unsigned long begin, unsigned long end)
>>>>
>>>> I am reminded of this discussion:
>>>> https://lore.kernel.org/all/CAPcyv4iVt=peUAk1qx_EfKn7aGJM=XwRUpJftBhkUgQEti2bJA@mail.gmail.com/
>>>>
>>>> As in, mmap will bypass this restriction, so if you really want the low
>>>> 1MiB to be unreadable, a solution for mmap is still needed...
>>>
>>> Glad you remembered that!
>>>
>>> This needs a self-test to verify the assumptions here. I can circle back
>>> next week or so take a look at turning this into a bigger series. If
>>> someone has cycles to take this on before that I would not say no to
>>> some help.
>>
>> Can't we simply treat return value of 2 for range_is_allowed the same way as
>> if 0 was returned in mmap_mem and simply fail the call with -EPERM?
> 
> The historical concern was that EPERM would break old tools. I don't
> have any current evidence either way, though.

Right, but we are only about to return 2 in a TVM context, so chances of 
running old tools are slim to none. Also it's perfectly valid to have 
mmap fail for a number of reasons, so old tools should be equipped with 
handling it returning -EPERM, no ?

>