linux-kernel - arm64: BUG report: Hugepages leads cpu failed to come online.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <D46D3E8D-631E-466D-B242-A8B447CD8D06@tencent.com>
Date:   Wed, 16 Sep 2020 12:22:07 +0000
From:   chengdongli(李成栋) 
        <chengdongli@...cent.com>
To:     "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "will@...nel.org" <will@...nel.org>,
        "mark.rutland@....com" <mark.rutland@....com>
CC:     Matteo Carlini <Matteo.Carlini@....com>,
        Yuping Luo <Yuping.Luo@....com>,
        haibinzhang(张海斌) 
        <haibinzhang@...cent.com>
Subject: arm64: BUG report: Hugepages leads cpu failed to come online.

Hello folks,

We observed that Arm VM instance (Centos 8) runs into timeout at boot time when it is launched by Qemu/KVM on an Arm server host that enabled hugepages (2MB/1GB).
We also noticed that Will has a workaround that applies to the guest kernel (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0e1645557d19fc6d88d3c40431f63a3c3a4c417b)
But, it simply increased the timeout time from 1s to 5s and didn't accepted by the upstream so far for some reasons. Moreover, even it is accepted by the upstream, it should take a long time for the distributions like CentOS to backport it.

We are thinking that if we have a chance to fix this problem at host side. Because most cloud providers need less time than distributions to upgrade their software, then eventually, no need to upgrade guest OS.
We have investigated the root cause for a few weeks, and barely understand this problem may related to page locking contention when guest kernel allocates per-cpu memory for each CPU at boot time.

However,  we are not so sure about the root cause and hopping you could share your expertise on this.  Really appreciate if you could share your insights!


Reproduce step:
=============
- Enable 1GB hugepage on the host kernel side. 
- Start vm using following command (qemu must support passing cpu topology through acpi&dt):
# qemu-system-aarch64 -name guest=12345,debug-threads=on \
-machine virt,accel=kvm,usb=off,dump-guest-core=off,gic-version=3 \
-cpu host \
-m 122880 \
-object iothread,id=iothread1 \
-object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages,share=yes,size=64424509440,host-nodes=0,policy=preferred \
-numa node,nodeid=0,cpus=0-29,memdev=ram-node0 \
-object memory-backend-file,id=ram-node1,prealloc=yes,mem-path=/dev/hugepages,share=yes,size=64424509440,host-nodes=1,policy=preferred \
-numa node,nodeid=1,cpus=30-59,memdev=ram-node1 \
-smp 60,sockets=2,cores=30,threads=1 \
-bios /usr/share/AAVMF/AAVMF_CODE.fd \
-drive file=centos8.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none \
-device virtio-blk-pci,scsi=off,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \
-net none -serial telnet::9001,server,nowait -monitor stdio

Guest kernel boot timeout log sample:
===============================

[ 0.017636] smp: Bringing up secondary CPUs ...
[ 1.039073] CPU1: failed to come online
[ 1.039486] CPU1: failed in unknown state : 0x0
[ 1.052547] Detected VIPT I-cache on CPU2
[ 1.052563] GICv3: CPU2: found redistributor 2 region 0:0x00000000080e0000
[ 1.052708] GICv3: CPU2: using allocated LPI pending table @0x0000000f006a0000
[ 1.052729] CPU2: Booted secondary processor 0x0000000001 [0x481fd010]
[ 2.079846] CPU3: failed to come online
[ 2.082698] CPU3: failed in unknown state : 0x0
[ 2.086925] Detected VIPT I-cache on CPU4
[ 2.086947] GICv3: CPU4: found redistributor 4 region 0:0x0000000008120000
[ 2.087232] GICv3: CPU4: using allocated LPI pending table @0x0000000f006c0000
[ 2.087259] CPU4: Booted secondary processor 0x0000000002 [0x481fd010]
[ 3.120634] CPU5: failed to come online
[ 3.121691] Detected VIPT I-cache on CPU5
[ 3.121754] CPU5: failed in unknown state : 0x0
[ 3.121774] GICv3: CPU5: found redistributor 5 region 0:0x0000000008140000


 Linux perf output (host side) while guest kernel booting:
==============================================

-   19.75%  worker           [kernel.kallsyms]        [k] queued_spin_lock_slowpath                                                                                                         
   - queued_spin_lock_slowpath                                                                                                                                                              
      - 12.98% kvm_mmu_notifier_invalidate_range_end                                                                                                                                        
           __mmu_notifier_invalidate_range_end                                                                                                                                              
           zap_page_range                                                                                                                                                                   
           __arm64_sys_madvise                                                                                                                                                              
           el0_svc_common.constprop.3                                                                                                                                                       
           el0_svc_handler                                                                                                                                                                  
           el0_svc                                                                                                                                                                          
           __madvise                                                                                                                                                                        
           tcmalloc::PageHeap::DecommitSpan                                                                                                                                                 
           tcmalloc::PageHeap::MergeIntoFreeList                                                                                                                                            
           tcmalloc::PageHeap::Delete                                                                                                                                                       
           tcmalloc::CentralFreeList::ReleaseToSpans                                                                                                                                        
           tcmalloc::CentralFreeList::ReleaseListToSpans                                                                                                                                    
           tcmalloc::CentralFreeList::InsertRange                                                                                                                                           
           tcmalloc::ThreadCache::ReleaseToCentralCache                                                                                                                                     
           tcmalloc::ThreadCache::Cleanup                                                                                                                                                   
           tcmalloc::ThreadCache::DeleteCache                                                                                                                                               
           __nptl_deallocate_tsd                                                                                                                                                            
           start_thread                                                                                                                                                                     
           thread_start                                                                                                                                                                     
      - 6.77% kvm_mmu_notifier_invalidate_range_start                                                                                                                                       
           __mmu_notifier_invalidate_range_start                                                                                                                                            
           zap_page_range                                                                                                                                                                   
           __arm64_sys_madvise                                                                                                                                                              
           el0_svc_common.constprop.3                                                                                                                                                       
           el0_svc_handler                                                                                                                                                                  
           el0_svc                                                                                                                                                                          
           __madvise                                                                                                                                                                        
           tcmalloc::PageHeap::DecommitSpan                                                                                                                                                 
           tcmalloc::PageHeap::MergeIntoFreeList                                                                                                                                            
           tcmalloc::PageHeap::Delete                                                                                                                                                       
           tcmalloc::CentralFreeList::ReleaseToSpans                                                                                                                                        
           tcmalloc::CentralFreeList::ReleaseListToSpans                                                                                                                                    
           tcmalloc::CentralFreeList::InsertRange                                                                                                                                           
           tcmalloc::ThreadCache::ReleaseToCentralCache                                                                                                                                     
           tcmalloc::ThreadCache::Cleanup                                                                                                                                                   
           tcmalloc::ThreadCache::DeleteCache                                                                                                                                               
           __nptl_deallocate_tsd                                                                                                                                                            
           start_thread                                                                                                                                                                     
           thread_start
 

Thanks,
Chengdong