lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87cymtdc0t.fsf@draig.linaro.org>
Date: Wed, 31 Jul 2024 12:41:22 +0100
From: Alex Bennée <alex.bennee@...aro.org>
To: Sean Christopherson <seanjc@...gle.com>
Cc: David Stevens <stevensd@...omium.org>,  Paolo Bonzini
 <pbonzini@...hat.com>,  Yu Zhang <yu.c.zhang@...ux.intel.com>,  Isaku
 Yamahata <isaku.yamahata@...il.com>,  Zhi Wang <zhi.wang.linux@...il.com>,
  Maxim Levitsky <mlevitsk@...hat.com>,  kvmarm@...ts.linux.dev,
  linux-kernel@...r.kernel.org,  kvm@...r.kernel.org
Subject: Re: [PATCH v11 0/8] KVM: allow mapping non-refcounted pages

Sean Christopherson <seanjc@...gle.com> writes:

> On Thu, Feb 29, 2024, David Stevens wrote:
>> From: David Stevens <stevensd@...omium.org>
>> 
>> This patch series adds support for mapping VM_IO and VM_PFNMAP memory
>> that is backed by struct pages that aren't currently being refcounted
>> (e.g. tail pages of non-compound higher order allocations) into the
>> guest.
>> 
>> Our use case is virtio-gpu blob resources [1], which directly map host
>> graphics buffers into the guest as "vram" for the virtio-gpu device.
>> This feature currently does not work on systems using the amdgpu driver,
>> as that driver allocates non-compound higher order pages via
>> ttm_pool_alloc_page().
>> 
>> First, this series replaces the gfn_to_pfn_memslot() API with a more
>> extensible kvm_follow_pfn() API. The updated API rearranges
>> gfn_to_pfn_memslot()'s args into a struct and where possible packs the
>> bool arguments into a FOLL_ flags argument. The refactoring changes do
>> not change any behavior.
>> 
>> From there, this series extends the kvm_follow_pfn() API so that
>> non-refconuted pages can be safely handled. This invloves adding an
>> input parameter to indicate whether the caller can safely use
>> non-refcounted pfns and an output parameter to tell the caller whether
>> or not the returned page is refcounted. This change includes a breaking
>> change, by disallowing non-refcounted pfn mappings by default, as such
>> mappings are unsafe. To allow such systems to continue to function, an
>> opt-in module parameter is added to allow the unsafe behavior.
>> 
>> This series only adds support for non-refcounted pages to x86. Other
>> MMUs can likely be updated without too much difficulty, but it is not
>> needed at this point. Updating other parts of KVM (e.g. pfncache) is not
>> straightforward [2].
>
> FYI, on the off chance that someone else is eyeballing this, I am working on
> revamping this series.  It's still a ways out, but I'm optimistic that we'll be
> able to address the concerns raised by Christoph and Christian, and maybe even
> get KVM out of the weeds straightaway (PPC looks thorny :-/).

I've applied this series to the latest 6.9.x while attempting to
diagnose some of the virtio-gpu problems it may or may not address.
However launching KVM guests keeps triggering a bunch of BUGs that
eventually leave a hung guest:

  12:16:54 [root@...ig:~] # dmesg -c                                                                                                                                           
  [252080.141629] RAX: ffffffffffffffda RBX: 0000560a64915500 RCX: 00007faa23e81c5b                                                                                            
  [252080.141629] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000017                                                                                            
  [252080.141630] RBP: 000000000000ae80 R08: 0000000000000000 R09: 0000000000000000                                                                                            
  [252080.141630] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000                                                                                            
  [252080.141631] R13: 0000000000000001 R14: 00000000000000b2 R15: 0000000000000002                                                                                            
  [252080.141632]  </TASK>                                                                                                                                                     
  [252080.141632] BUG: Bad page state in process CPU 0/KVM  pfn:fb1665                                                                                                         
  [252080.141633] page: refcount:0 mapcount:1 mapping:0000000000000000 index:0x7fa8117c3 pfn:0xfb1665                                                                          
  [252080.141633] flags: 0x17ffffc00a000c(referenced|uptodate|mappedtodisk|swapbacked|node=0|zone=2|lastcpupid=0x1fffff)                                                       
  [252080.141634] page_type: 0x0()                                                                                                                                             
  [252080.141635] raw: 0017ffffc00a000c dead000000000100 dead000000000122 0000000000000000                                                                                     
  [252080.141635] raw: 00000007fa8117c3 0000000000000000 0000000000000000 0000000000000000                                                                                     
  [252080.141635] page dumped because: nonzero mapcount                                                                                                                        
  [252080.141636] Modules linked in: vhost_net vhost vhost_iotlb tap tun uas usb_storage veth cfg80211 nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter nft_ma
  sq wireguard libchacha20poly1305 chacha_x86_64 poly1305_x86_64 curve25519_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel rfcomm snd_seq_dummy snd_hrtimer s
  nd_seq xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetl
  ink bridge stp llc qrtr overlay cmac algif_hash algif_skcipher af_alg bnep binfmt_misc squashfs snd_hda_codec_hdmi intel_uncore_frequency snd_ctl_led intel_uncore_frequency_
  common ledtrig_audio x86_pkg_temp_thermal intel_powerclamp coretemp snd_sof_pci_intel_tgl snd_sof_intel_hda_common kvm_intel soundwire_intel soundwire_generic_allocation btu
  sb snd_sof_intel_hda_mlink sd_mod soundwire_cadence btrtl snd_hda_codec_realtek kvm sg snd_sof_intel_hda btintel snd_sof_pci btbcm snd_hda_codec_generic btmtk               
  [252080.141656]  snd_sof_xtensa_dsp crc32_pclmul bluetooth snd_hda_scodec_component ghash_clmulni_intel snd_sof sha256_ssse3 sha1_ssse3 snd_sof_utils snd_soc_hdac_hda snd_hd
  a_ext_core snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core snd_compress soundwire_bus sha3_generic jitterentropy_rng aesni_intel snd_hda_intel snd_intel_dspcfg crypto_sim
  d sha512_ssse3 snd_intel_sdw_acpi cryptd sha512_generic uvcvideo snd_hda_codec snd_usb_audio videobuf2_vmalloc uvc ctr videobuf2_memops snd_hda_core snd_usbmidi_lib videobuf
  2_v4l2 snd_rawmidi drbg snd_hwdep dell_wmi snd_seq_device nls_ascii ahci ansi_cprng iTCO_wdt processor_thermal_device_pci videodev nls_cp437 snd_pcm intel_pmc_bxt dell_smbio
  s libahci processor_thermal_device rapl rtsx_pci_sdmmc iTCO_vendor_support ecdh_generic mmc_core mei_hdcp watchdog libata intel_rapl_msr videobuf2_common rfkill vfat process
  or_thermal_wt_hint pl2303 snd_timer dcdbas dell_wmi_ddv dell_wmi_sysman processor_thermal_rfim ucsi_acpi fat intel_cstate usbserial intel_uncore cdc_acm mc battery ecc      
  [252080.141670]  firmware_attributes_class dell_wmi_descriptor wmi_bmof dell_smm_hwmon processor_thermal_rapl pcspkr scsi_mod mei_me intel_lpss_pci snd typec_ucsi igc e1000e
   i2c_i801 rtsx_pci intel_rapl_common intel_lpss roles mei soundcore processor_thermal_wt_req i2c_smbus idma64 scsi_common processor_thermal_power_floor typec processor_therm
  al_mbox button intel_pmc_core int3403_thermal int340x_thermal_zone intel_vsec pmt_telemetry intel_hid int3400_thermal pmt_class sparse_keymap acpi_tad acpi_pad acpi_thermal_
  rel msr parport_pc ppdev lp parport fuse loop efi_pstore configfs efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 hid_microsoft joydev ff_memless hid_generic usb
  hid hid btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq evdev dm_mod i915 i2c_algo_bit drm_buddy ttm drm_display_helper xhci_pci xhci_hcd drm_kms_helper nvme nvm
  e_core drm t10_pi usbcore video crc64_rocksoft crc64 crc_t10dif cec crct10dif_generic crct10dif_pclmul crc32c_intel rc_core usb_common crct10dif_common wmi                  
  [252080.141686]  pinctrl_alderlake                                                                                                                                           
  [252080.141686] CPU: 8 PID: 1819169 Comm: CPU 0/KVM Tainted: G    B   W          6.9.12-ajb-00008-gfcd4b7efbad0 #17                                                          
  [252080.141687] Hardware name: Dell Inc. Precision 3660/0PRR48, BIOS 2.8.1 08/14/2023                                                                                        
  [252080.141688] Call Trace:                                                                                                                                                  
  [252080.141688]  <TASK>                                                                                                                                                      
  [252080.141688]  dump_stack_lvl+0x60/0x80                                                                                                                                    
  [252080.141689]  bad_page+0x70/0x100                                                                                                                                         
  [252080.141690]  free_unref_page_prepare+0x22a/0x370                                                                                                                         
  [252080.141692]  free_unref_folios+0xe5/0x340                                                                                                                                
  [252080.141693]  ? __mem_cgroup_uncharge_folios+0x7a/0xa0                                                                                                                    
  [252080.141694]  folios_put_refs+0x147/0x1e0                                                                                                                                 
  [252080.141696]  ? __pfx_lru_add_fn+0x10/0x10                                                                                                                                
  [252080.141697]  folio_batch_move_lru+0xc8/0x140                                                                                                                             
  [252080.141699]  folio_add_lru+0x51/0xa0                                                                                                                                     
  [252080.141700]  do_wp_page+0x4dd/0xb60                                                                                                                                      
  [252080.141701]  __handle_mm_fault+0xb2a/0xe30          
  [252080.141703]  handle_mm_fault+0x18c/0x320                                                                                                                                 
  [252080.141704]  __get_user_pages+0x164/0x6f0                                                                                                                                
  [252080.141705]  get_user_pages_unlocked+0xe2/0x370                                                                                                                          
  [252080.141706]  hva_to_pfn+0xa0/0x740 [kvm]                                                                                                                                 
  [252080.141724]  kvm_faultin_pfn+0xf3/0x5f0 [kvm]                                                                                                                            
  [252080.141750]  kvm_tdp_page_fault+0x100/0x150 [kvm]                                                                                                                        
  [252080.141774]  kvm_mmu_page_fault+0x27e/0x7f0 [kvm]                                                                                                                        
  [252080.141798]  ? em_rsm+0xad/0x170 [kvm]                                                                                                                                   
  [252080.141823]  ? writeback_registers+0x44/0x80 [kvm]                                                                                                                       
  [252080.141848]  ? vmx_set_cr0+0xc7/0x1320 [kvm_intel]                                                                                                                       
  [252080.141853]  ? x86_emulate_insn+0x484/0xe60 [kvm]                                                                                                                        
  [252080.141877]  ? vmx_vmexit+0x6e/0xd0 [kvm_intel]                                                                                                                          
  [252080.141882]  ? vmx_vmexit+0x99/0xd0 [kvm_intel]                                                                                                                          
  [252080.141887]  vmx_handle_exit+0x129/0x930 [kvm_intel]                                                                                                                     
  [252080.141892]  kvm_arch_vcpu_ioctl_run+0x682/0x15b0 [kvm]                                                                                                                  
  [252080.141918]  kvm_vcpu_ioctl+0x23d/0x6f0 [kvm]                                                                                                                            
  [252080.141936]  ? __seccomp_filter+0x32f/0x500                                                                                                                              
  [252080.141937]  ? kvm_io_bus_read+0x42/0xd0 [kvm]                                                                                                                           
  [252080.141956]  __x64_sys_ioctl+0x90/0xd0                                                                                                                                   
  [252080.141957]  do_syscall_64+0x80/0x190                                                                                                                                    
  [252080.141958]  ? kvm_arch_vcpu_put+0x126/0x160 [kvm]                                                                                                                       
  [252080.141982]  ? vcpu_put+0x1e/0x50 [kvm]                                                                                                                                  
  [252080.141999]  ? kvm_arch_vcpu_ioctl_run+0x757/0x15b0 [kvm]                                                                                                                
  [252080.142023]  ? kvm_vcpu_ioctl+0x29e/0x6f0 [kvm]                                                                                                                          
  [252080.142040]  ? __seccomp_filter+0x32f/0x500                                                                                                                              
  [252080.142042]  ? kvm_on_user_return+0x60/0x90 [kvm]                                                                                                                        
  [252080.142065]  ? fire_user_return_notifiers+0x30/0x60                                                                                                                      
  [252080.142066]  ? syscall_exit_to_user_mode+0x73/0x200                                                                                                                      
  [252080.142067]  ? do_syscall_64+0x8c/0x190                                                                                                                                  
  [252080.142068]  ? kvm_on_user_return+0x60/0x90 [kvm]                                                                                                                        
  [252080.142090]  ? fire_user_return_notifiers+0x30/0x60                                                                                                                      
  [252080.142091]  ? syscall_exit_to_user_mode+0x73/0x200                                                                                                                      
  [252080.142092]  ? do_syscall_64+0x8c/0x190                                                                                                                                  
  [252080.142093]  ? do_syscall_64+0x8c/0x190                                                                                                                                  
  [252080.142094]  ? do_syscall_64+0x8c/0x190                                                                                                                                  
  [252080.142095]  ? exc_page_fault+0x72/0x170                                                                                                                                 
  [252080.142096]  entry_SYSCALL_64_after_hwframe+0x76/0x7e                                                                                                                    

This backtrace repeats for a large chunk of pfns

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ