[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <71e3905a5b72c5b97df837041b19175540ebb023.camel@mengyan1223.wang>
Date: Tue, 30 Mar 2021 20:04:32 +0800
From: Xi Ruoyao <xry111@...gyan1223.wang>
To: Christian König
<ckoenig.leichtzumerken@...il.com>,
Alex Deucher <alexander.deucher@....com>,
Christian König <christian.koenig@....com>
Cc: David Airlie <airlied@...ux.ie>,
Felix Kuehling <Felix.Kuehling@....com>,
linux-kernel@...r.kernel.org, dri-devel@...ts.freedesktop.org,
Dan Horák <dan@...ny.cz>,
amd-gfx@...ts.freedesktop.org, Daniel Vetter <daniel@...ll.ch>,
stable@...r.kernel.org
Subject: Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems
On 2021-03-30 03:40 +0800, Xi Ruoyao wrote:
> On 2021-03-29 21:36 +0200, Christian König wrote:
> > Am 29.03.21 um 21:27 schrieb Xi Ruoyao:
> > > Hi Christian,
> > >
> > > I don't think there is any constraint implemented to ensure `num_entries %
> > > AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`. For example, in `amdgpu_vm_bo_map()`:
> > >
> > > /* validate the parameters */
> > > if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK
> > > > >
> > > size == 0 || size & AMDGPU_GPU_PAGE_MASK)
> > > return -EINVAL;
> > >
> > > /* snip */
> > >
> > > saddr /= AMDGPU_GPU_PAGE_SIZE;
> > > eaddr /= AMDGPU_GPU_PAGE_SIZE;
> > >
> > > /* snip */
> > >
> > > mapping->start = saddr;
> > > mapping->last = eaddr;
> > >
> > >
> > > If we really want to ensure (mapping->last - mapping->start + 1) %
> > > AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace
> > > "AMDGPU_GPU_PAGE_MASK"
> > > in "validate the parameters" with "PAGE_MASK".
It should be "~PAGE_MASK", "PAGE_MASK" has an opposite convention of
"AMDGPU_GPU_PAGE_MASK" :(.
> > Yeah, good point.
> >
> > > I tried it and it broke userspace: Xorg startup fails with EINVAL with
> > > this
> > > change.
> >
> > Well in theory it is possible that we always fill the GPUVM on a 4k
> > basis while the native page size of the CPU is larger. Let me double
> > check the code.
On my platform, there are two issues both causing the VM corruption. One is
fixed by agd5f/linux@...01e7. Another is in Mesa from userspace: it uses
`dev_info->gart_page_size` as the alignment, but the kernel AMDGPU driver
expects it to use `dev_info->virtual_address_alignment`.
If we can make the change to fill the GPUVM on a 4k basis, we can fix this issue
and make virtual_address_alignment = 4K. Otherwise, we should fortify the
parameter validation, changing "AMDGPU_GPU_PAGE_MASK" to "~PAGE_MASK". Then the
userspace will just get an EINVAL, instead of a slient VM corruption. And
someone should tell Mesa developers to fix the code in this case.
--
Xi Ruoyao <xry111@...gyan1223.wang>
School of Aerospace Science and Technology, Xidian University
Powered by blists - more mailing lists