linux-kernel - Re: [PATCH -V3] x86/tdx, memory hotplug: Check whole hot-adding memory range for TDX

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <87r0884jyj.fsf@yhuang6-desk2.ccr.corp.intel.com>
Date: Tue, 22 Oct 2024 16:31:48 +0800
From: "Huang, Ying" <ying.huang@...el.com>
To: Oscar Salvador <osalvador@...e.de>
Cc: Thomas Gleixner <tglx@...utronix.de>,  Ingo Molnar <mingo@...hat.com>,
  Borislav Petkov <bp@...en8.de>,  Dave Hansen
 <dave.hansen@...ux.intel.com>,  "Kirill A . Shutemov"
 <kirill.shutemov@...ux.intel.com>,  x86@...nel.org,
  linux-coco@...ts.linux.dev,  linux-mm@...ck.org,
  linux-kernel@...r.kernel.org,  Dan Williams <dan.j.williams@...el.com>,
  Kai Huang <kai.huang@...el.com>,  David Hildenbrand <david@...hat.com>,
  "H. Peter Anvin" <hpa@...or.com>,  Andy Lutomirski <luto@...nel.org>
Subject: Re: [PATCH -V3] x86/tdx, memory hotplug: Check whole hot-adding
 memory range for TDX

Hi, Oscar,

Oscar Salvador <osalvador@...e.de> writes:

> On Tue, Oct 22, 2024 at 11:16:17AM +0800, Huang Ying wrote:
>> On systems with TDX (Trust Domain eXtensions) enabled, current kernel
>> checks the TDX compatibility of the hot-added memory ranges through a
>> memory hotplug notifier for each memory_block.  If a memory range
>> which isn't TDX compatible is hot-added, for example, some CXL memory,
>> the command line as follows,
>> 
>>   $ echo 1 > /sys/devices/system/node/nodeX/memoryY/online
>> 
>> will report something like,
>> 
>>   bash: echo: write error: Operation not permitted
>> 
>> If pr_debug() is enabled, current kernel will show the error message
>> like below in the kernel log,
>> 
>>   online_pages [mem 0xXXXXXXXXXX-0xXXXXXXXXXX] failed
>> 
>> Both are too general to root cause the problem.  This may confuse
>> users.  One solution is to print some error messages in the TDX memory
>> hotplug notifier.  However, kernel calls memory hotplug notifiers for
>> each memory block, so this may lead to a large volume of messages in
>> the kernel log if a large number of memory blocks are onlined with a
>> script or automatically.  For example, the typical size of memory
>> block is 128MB on x86_64, when online 64GB CXL memory, 512 messages
>> will be logged.
>> 
>> Therefore, this patch checks the TDX compatibility of the whole
>> hot-adding memory range through a newly added architecture specific
>> function (arch_check_hotplug_memory_range()).  If this patch rejects
>> the memory hot-adding for TDX compatibility, it will output a kernel
>> log message like below,
>> 
>>   virt/tdx: Reject hot-adding memory range: 0xXXXXXXXX-0xXXXXXXXX for TDX compatibility.
>> 
>> The target use case is to support CXL memory on TDX enabled systems.
>> If the CXL memory isn't compatible with TDX, the kernel will reject
>> the whole CXL memory range.  While the CXL memory can still be used
>> via devdax interface.
>> 
>> This also makes the original TDX memory hotplug notifier useless, so
>> this patch deletes it.
>> 
>> Signed-off-by: "Huang, Ying" <ying.huang@...el.com>
>
> Acked-by: Oscar Salvador <osalvador@...e.de>

Thanks!

> One question below:
>
> ...
>
>> +int tdx_check_hotplug_memory_range(u64 start, u64 size)
>>  {
>> -	struct memory_notify *mn = v;
>> -
>> -	if (action != MEM_GOING_ONLINE)
>> -		return NOTIFY_OK;
>> +	u64 start_pfn = PHYS_PFN(start);
>> +	u64 end_pfn = PHYS_PFN(start + size);
>>  
>>  	/*
>>  	 * Empty list means TDX isn't enabled.  Allow any memory
>> -	 * to go online.
>> +	 * to be hot-added.
>>  	 */
>>  	if (list_empty(&tdx_memlist))
>> -		return NOTIFY_OK;
>> +		return 0;
>>  
>>  	/*
>>  	 * The TDX memory configuration is static and can not be
>> -	 * changed.  Reject onlining any memory which is outside of
>> +	 * changed.  Reject hot-adding any memory which is outside of
>>  	 * the static configuration whether it supports TDX or not.
>>  	 */
>> -	if (is_tdx_memory(mn->start_pfn, mn->start_pfn + mn->nr_pages))
>> -		return NOTIFY_OK;
>> +	if (is_tdx_memory(start_pfn, end_pfn))
>> +		return 0;
>>  
>> -	return NOTIFY_BAD;
>> +	pr_info("Reject hot-adding memory range: %#llx-%#llx for TDX compatibility.\n",
>> +		start, start + size);
>
> Why not using pr_err() here?
>
> I was checking which kind of information level we use when failing at
> hot-adding memory, and we seem to be using pr_err(), and pr_debug() when
> onlining/offlining.
>
> Not a big deal, and not saying it is wrong, but was just wondering the reasoning
> behind.

TBH, I have no strong opinion about which log level is more appropriate.
IMHO, it shouldn't be pr_debug() to make it easy for users to root cause
the hot-adding failure.  And, it appears too harsh to use pr_err(),
because there's no program error, etc.  So, I think that something
in-between is more appropriate.  That is, pr_warn(), pr_notice, or
pr_info().  In them, I prefer pr_info() a little.

--
Best Regards,
Huang, Ying