lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aEEFRXF+HrZVh5He@yzhao56-desk.sh.intel.com>
Date: Thu, 5 Jun 2025 10:47:33 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Ackerley Tng <ackerleytng@...gle.com>
CC: <vannapurve@...gle.com>, <pbonzini@...hat.com>, <seanjc@...gle.com>,
	<linux-kernel@...r.kernel.org>, <kvm@...r.kernel.org>, <x86@...nel.org>,
	<rick.p.edgecombe@...el.com>, <dave.hansen@...el.com>,
	<kirill.shutemov@...el.com>, <tabba@...gle.com>, <quic_eberman@...cinc.com>,
	<michael.roth@....com>, <david@...hat.com>, <vbabka@...e.cz>,
	<jroedel@...e.de>, <thomas.lendacky@....com>, <pgonda@...gle.com>,
	<zhiquan1.li@...el.com>, <fan.du@...el.com>, <jun.miao@...el.com>,
	<ira.weiny@...el.com>, <isaku.yamahata@...el.com>, <xiaoyao.li@...el.com>,
	<binbin.wu@...ux.intel.com>, <chao.p.peng@...el.com>
Subject: Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge
 pages

On Wed, Jun 04, 2025 at 01:02:54PM -0700, Ackerley Tng wrote:
> Hi Yan,
> 
> While working on the 1G (aka HugeTLB) page support for guest_memfd
> series [1], we took into account conversion failures too. The steps are
> in kvm_gmem_convert_range(). (It might be easier to pull the entire
> series from GitHub [2] because the steps for conversion changed in two
> separate patches.)
...
> [2] https://github.com/googleprodkernel/linux-cc/tree/gmem-1g-page-support-rfc-v2

Hi Ackerley,
Thanks for providing this branch.

I'm now trying to make TD huge pages working on this branch and would like to
report to you errors I encountered during this process early.

1. symbol arch_get_align_mask() is not available when KVM is compiled as module.
   I currently workaround it as follows:

--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -102,8 +102,13 @@ static unsigned long kvm_gmem_get_align_mask(struct file *file,
        void *priv;

        inode = file_inode(file);
-       if (!kvm_gmem_has_custom_allocator(inode))
-             return arch_get_align_mask(file, flags);
+       if (!kvm_gmem_has_custom_allocator(inode)) {
+               page_size = 1 << PAGE_SHIFT;
+               return PAGE_MASK & (page_size - 1);
+       }


2. Bug of Sleeping function called from invalid context 

[  193.523469] BUG: sleeping function called from invalid context at ./include/linux/sched/mm.h:325
[  193.539885] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3332, name: guest_memfd_con
[  193.556235] preempt_count: 1, expected: 0
[  193.564518] RCU nest depth: 0, expected: 0
[  193.572866] 3 locks held by guest_memfd_con/3332:
[  193.581800]  #0: ff16f8ec217e4438 (sb_writers#14){.+.+}-{0:0}, at: __x64_sys_fallocate+0x46/0x80
[  193.598252]  #1: ff16f8fbd85c8310 (mapping.invalidate_lock#4){++++}-{4:4}, at: kvm_gmem_fallocate+0x9e/0x310 [kvm]
[  193.616706]  #2: ff3189b5e4f65018 (&(kvm)->mmu_lock){++++}-{3:3}, at: kvm_gmem_invalidate_begin_and_zap+0x17f/0x260 [kvm]
[  193.635790] Preemption disabled at:
[  193.635793] [<ffffffffc0850c6f>] kvm_gmem_invalidate_begin_and_zap+0x17f/0x260 [kvm]

This is because add_to_invalidated_kvms() invokes kzalloc() inside kvm->mmu_lock
which is a kind of spinlock.

I workarounded it as follows.

 static int kvm_gmem_invalidate_begin_and_zap(struct kvm_gmem *gmem,
                                             pgoff_t start, pgoff_t end,
@@ -1261,13 +1268,13 @@ static int kvm_gmem_invalidate_begin_and_zap(struct kvm_gmem *gmem,
                        KVM_MMU_LOCK(kvm);
                        kvm_mmu_invalidate_begin(kvm);

-                       if (invalidated_kvms) {
-                               ret = add_to_invalidated_kvms(invalidated_kvms, kvm);
-                               if (ret) {
-                                       kvm_mmu_invalidate_end(kvm);
-                                       goto out;
-                               }
-                       }
                }


@@ -1523,12 +1530,14 @@ static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len)
        }

 out:
-       list_for_each_entry_safe(entry, tmp, &invalidated_kvms, list) {
-               kvm_gmem_do_invalidate_end(entry->kvm);
-               list_del(&entry->list);
-               kfree(entry);
-       }
+       list_for_each_entry(gmem, gmem_list, entry)
+               kvm_gmem_do_invalidate_end(gmem->kvm);

        filemap_invalidate_unlock(inode->i_mapping);


Will let you know more findings later.

Thanks
Yan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ