lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f3fb57055f0bd3f19bb6ac397dc92113e1555764.camel@mengyan1223.wang>
Date:   Tue, 30 Mar 2021 21:02:08 +0800
From:   Xi Ruoyao <xry111@...gyan1223.wang>
To:     Christian König <christian.koenig@....com>,
        Christian König 
        <ckoenig.leichtzumerken@...il.com>,
        Alex Deucher <alexander.deucher@....com>
Cc:     David Airlie <airlied@...ux.ie>,
        Felix Kuehling <Felix.Kuehling@....com>,
        linux-kernel@...r.kernel.org, dri-devel@...ts.freedesktop.org,
        Dan Horák <dan@...ny.cz>,
        amd-gfx@...ts.freedesktop.org, Daniel Vetter <daniel@...ll.ch>,
        stable@...r.kernel.org
Subject: Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems

On 2021-03-30 14:55 +0200, Christian König wrote:
> Am 30.03.21 um 14:04 schrieb Xi Ruoyao:
> > On 2021-03-30 03:40 +0800, Xi Ruoyao wrote:
> > > On 2021-03-29 21:36 +0200, Christian König wrote:
> > > > Am 29.03.21 um 21:27 schrieb Xi Ruoyao:
> > > > > Hi Christian,
> > > > > 
> > > > > I don't think there is any constraint implemented to ensure `num_entries
> > > > > %
> > > > > AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`.  For example, in
> > > > > `amdgpu_vm_bo_map()`:
> > > > > 
> > > > >           /* validate the parameters */
> > > > >           if (saddr & AMDGPU_GPU_PAGE_MASK || offset &
> > > > > AMDGPU_GPU_PAGE_MASK
> > > > >               size == 0 || size & AMDGPU_GPU_PAGE_MASK)
> > > > >                   return -EINVAL;
> > > > > 
> > > > > /* snip */
> > > > > 
> > > > >           saddr /= AMDGPU_GPU_PAGE_SIZE;
> > > > >           eaddr /= AMDGPU_GPU_PAGE_SIZE;
> > > > > 
> > > > > /* snip */
> > > > > 
> > > > >           mapping->start = saddr;
> > > > >           mapping->last = eaddr;
> > > > > 
> > > > > 
> > > > > If we really want to ensure (mapping->last - mapping->start + 1) %
> > > > > AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace
> > > > > "AMDGPU_GPU_PAGE_MASK"
> > > > > in "validate the parameters" with "PAGE_MASK".
> > It should be "~PAGE_MASK", "PAGE_MASK" has an opposite convention of
> > "AMDGPU_GPU_PAGE_MASK" :(.
> > 
> > > > Yeah, good point.
> > > > 
> > > > > I tried it and it broke userspace: Xorg startup fails with EINVAL with
> > > > > this
> > > > > change.
> > > > Well in theory it is possible that we always fill the GPUVM on a 4k
> > > > basis while the native page size of the CPU is larger. Let me double
> > > > check the code.
> > On my platform, there are two issues both causing the VM corruption.  One is
> > fixed by agd5f/linux@...01e7.
> 
> What is fe001e7? A commit id? If yes then that is to short and I can't 
> find it.
> 
> >    Another is in Mesa from userspace:  it uses
> > `dev_info->gart_page_size` as the alignment, but the kernel AMDGPU driver
> > expects it to use `dev_info->virtual_address_alignment`.
> 
> Mhm, looking at the kernel code I would rather say Mesa is correct and 
> the kernel driver is broken.
> 
> The gart_page_size is limited by the CPU page size, but the 
> virtual_address_alignment isn't.
> 
> > If we can make the change to fill the GPUVM on a 4k basis, we can fix this
> > issue
> > and make virtual_address_alignment = 4K.  Otherwise, we should fortify the
> > parameter validation, changing "AMDGPU_GPU_PAGE_MASK" to "~PAGE_MASK".  Then
> > the
> > userspace will just get an EINVAL, instead of a slient VM corruption.  And
> > someone should tell Mesa developers to fix the code in this case.
> 
> I rather see this as a kernel bug. Can you test if this code fragment 
> fixes your issue:
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index 64beb3399604..e1260b517e1b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -780,7 +780,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void 
> *data, struct drm_file *filp)
>                  }
>                  dev_info->virtual_address_alignment = 
> max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE);
>                  dev_info->pte_fragment_size = (1 << 
> adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE;
> -               dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE;
> +               dev_info->gart_page_size = 
> dev_info->virtual_address_alignment;
>                  dev_info->cu_active_number = adev->gfx.cu_info.number;
>                  dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask;
>                  dev_info->ce_ram_size = adev->gfx.ce_ram_size;

It works.  I've seen it at
https://github.com/xen0n/linux/commit/84ada72983838bd7ce54bc32f5d34ac5b5aae191
before (with a common sub-expression, though :).

-- 
Xi Ruoyao <xry111@...gyan1223.wang>
School of Aerospace Science and Technology, Xidian University

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ