lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bd21a3bf-2033-288d-d376-77d7bc054ca9@collabora.com>
Date:   Sat, 30 Sep 2023 16:34:37 +0300
From:   Dmitry Osipenko <dmitry.osipenko@...labora.com>
To:     David Stevens <stevensd@...omium.org>
Cc:     Sean Christopherson <seanjc@...gle.com>,
        Yu Zhang <yu.c.zhang@...ux.intel.com>,
        Isaku Yamahata <isaku.yamahata@...il.com>,
        Zhi Wang <zhi.wang.linux@...il.com>, kvmarm@...ts.linux.dev,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: [PATCH v9 6/6] KVM: x86/mmu: Handle non-refcounted pages

On 9/21/23 23:06, Dmitry Osipenko wrote:
> On 9/19/23 05:59, David Stevens wrote:
>> On Mon, Sep 18, 2023 at 8:19 PM Dmitry Osipenko
>> <dmitry.osipenko@...labora.com> wrote:
>>>
>>> On 9/18/23 12:58, Dmitry Osipenko wrote:
>>>> On 9/11/23 05:16, David Stevens wrote:
>>>>> From: David Stevens <stevensd@...omium.org>
>>>>>
>>>>> Handle non-refcounted pages in __kvm_faultin_pfn. This allows the host
>>>>> to map memory into the guest that is backed by non-refcounted struct
>>>>> pages - for example, the tail pages of higher order non-compound pages
>>>>> allocated by the amdgpu driver via ttm_pool_alloc_page.
>>>>>
>>>>> The bulk of this change is tracking the is_refcounted_page flag so that
>>>>> non-refcounted pages don't trigger page_count() == 0 warnings. This is
>>>>> done by storing the flag in an unused bit in the sptes. There are no
>>>>> bits available in PAE SPTEs, so non-refcounted pages can only be handled
>>>>> on TDP and x86-64.
>>>>>
>>>>> Signed-off-by: David Stevens <stevensd@...omium.org>
>>>>> ---
>>>>>  arch/x86/kvm/mmu/mmu.c          | 52 +++++++++++++++++++++++----------
>>>>>  arch/x86/kvm/mmu/mmu_internal.h |  1 +
>>>>>  arch/x86/kvm/mmu/paging_tmpl.h  |  8 +++--
>>>>>  arch/x86/kvm/mmu/spte.c         |  4 ++-
>>>>>  arch/x86/kvm/mmu/spte.h         | 12 +++++++-
>>>>>  arch/x86/kvm/mmu/tdp_mmu.c      | 22 ++++++++------
>>>>>  include/linux/kvm_host.h        |  3 ++
>>>>>  virt/kvm/kvm_main.c             |  6 ++--
>>>>>  8 files changed, 76 insertions(+), 32 deletions(-)
>>>>
>>>> Could you please tell which kernel tree you used for the base of this
>>>> series? This patch #6 doesn't apply cleanly to stable/mainline/next/kvm
>>>>
>>>> error: sha1 information is lacking or useless (arch/x86/kvm/mmu/mmu.c).
>>>> error: could not build fake ancestor
>>>
>>> I applied the patch manually to v6.5.2 and tested Venus using Intel TGL iGPU, the intel driver is crashing:
>>>
>>>    BUG: kernel NULL pointer dereference, address: 0000000000000058
>>>    #PF: supervisor read access in kernel mode
>>>    #PF: error_code(0x0000) - not-present page
>>>    PGD 0 P4D 0
>>>    Oops: 0000 [#1] PREEMPT SMP
>>>    CPU: 1 PID: 5926 Comm: qemu-system-x86 Not tainted 6.5.2+ #114
>>>    Hardware name: LENOVO 20VE/LNVNB161216, BIOS F8CN43WW(V2.06) 08/12/2021
>>>    RIP: 0010:gen8_ppgtt_insert+0x50b/0x8f0
>>>    Code: 00 00 f7 c2 00 00 20 00 74 15 f7 c3 ff ff 1f 00 75 0d 41 81 fc ff ff 1f 00 0f 87 0e 02 00 00 48 8b 74 24 08 44 89 c0 45 85 ed <48> 8b 4e 58 48 8b 04 c1 0f 85 0b 02 00 00 81 e2 00 00 01 00 0f 84
>>>    RSP: 0018:ffffafc085afb820 EFLAGS: 00010246
>>>    RAX: 0000000000000000 RBX: 00000000e9604000 RCX: 000000000000001b
>>>    RDX: 0000000000211000 RSI: 0000000000000000 RDI: ffff9513d44c1000
>>>    RBP: ffff951106f8dfc0 R08: 0000000000000000 R09: 0000000000000003
>>>    R10: 0000000000000fff R11: 00000000e9800000 R12: 00000000001fc000
>>>    R13: 0000000000000000 R14: 0000000000001000 R15: 0000ffff00000000
>>>    FS:  00007f2a5bcced80(0000) GS:ffff951a87a40000(0000) knlGS:0000000000000000
>>>    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>    CR2: 0000000000000058 CR3: 0000000116f16006 CR4: 0000000000772ee0
>>>    PKRU: 55555554
>>>    Call Trace:
>>>     <TASK>
>>>     ? __die+0x1f/0x60
>>>     ? page_fault_oops+0x14d/0x420
>>>     ? exc_page_fault+0x3d7/0x880
>>>     ? lock_acquire+0xc9/0x290
>>>     ? asm_exc_page_fault+0x22/0x30
>>>     ? gen8_ppgtt_insert+0x50b/0x8f0
>>>     ppgtt_bind_vma+0x4f/0x60
>>>     fence_work+0x1b/0x70
>>>     fence_notify+0x8f/0x130
>>>     __i915_sw_fence_complete+0x58/0x230
>>>     i915_vma_pin_ww+0x513/0xa80
>>>     eb_validate_vmas+0x17e/0x9e0
>>>     ? eb_pin_engine+0x2bb/0x340
>>>     i915_gem_do_execbuffer+0xc85/0x2bf0
>>>     ? __lock_acquire+0x3b6/0x21c0
>>>     i915_gem_execbuffer2_ioctl+0xee/0x240
>>>     ? i915_gem_do_execbuffer+0x2bf0/0x2bf0
>>>     drm_ioctl_kernel+0x9d/0x140
>>>     drm_ioctl+0x1dd/0x410
>>>     ? i915_gem_do_execbuffer+0x2bf0/0x2bf0
>>>     ? __fget_files+0xc5/0x170
>>>     __x64_sys_ioctl+0x8c/0xc0
>>>     do_syscall_64+0x34/0x80
>>>     entry_SYSCALL_64_after_hwframe+0x46/0xb0
>>>    RIP: 0033:0x7f2a60b0c9df
>>>
>>>
>>> $ ./scripts/faddr2line ./vmlinux gen8_ppgtt_insert+0x50b/0x8f0
>>> gen8_ppgtt_insert+0x50b/0x8f0:
>>> i915_pt_entry at drivers/gpu/drm/i915/gt/intel_gtt.h:557
>>> (inlined by) gen8_ppgtt_insert_huge at drivers/gpu/drm/i915/gt/gen8_ppgtt.c:641
>>> (inlined by) gen8_ppgtt_insert at drivers/gpu/drm/i915/gt/gen8_ppgtt.c:743
>>>
>>> It's likely should be the i915 driver issue that is crashes with the NULL deref, but the origin of the bug should be the kvm page fault handling.
>>>
>>> David, could you please tell what tests you've run and post a link to yours kernel tree? Maybe I made obscure mistake while applied the patch manually.
>>
>> For tests, I ran the kvm selftests and then various ChromeOS
>> virtualization tests. Two things to note about the ChromeOS
>> virtualization tests are that they use crosvm instead of qemu, and
>> they use virtio-gpu+virgl for graphics in the guest. I tested on an
>> AMD device (since the goal of this series is to fix a compatibility
>> issue with the amdgpu driver), and on a TGL device.
>>
>> I don't have an easy way to share my kernel tree, but it's based on
>> v6.5-r3. The series I sent out is rebased onto the kvm next branch,
>> but there were only minor conflicts.
> 
> It's good that you mentioned crosvm because I only tested qemu.
> Interestingly, the problem doesn't present using crosvm.
> 
> I made few more tests using these options:
> 
>   1. yours latest version of the patch based on v6.5.4
>   2. yours latest version of the patch based on the kvm tree
>   3. by forward-porting yours older version of the kvm patchset to v6.5
> that works well based on v6.4 kernel
> 
> Neither of the options work with qemu and v6.5 kernel, but crosvm works.
> 
> At first I felt confident that it must be i915 bug of the v6.5 kernel.
> But given that crosvm works, now I'm not sure.
> 
> Will try to bisect 6.5 kernel, at minimum the i915 driver changes.

The bisection said that it's not i915 bug.

Good news, the bug was fixed in a recent linux-next. Don't know what is
the exact fix, suppose it's something in mm/

-- 
Best regards,
Dmitry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ