linux-kernel - Re: [PATCH] drm/amdgpu: Raven: don't allow mixing GTT and VRAM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c4f9dbe8-d224-478f-a91f-03a420333fde@amd.com>
Date: Tue, 22 Jul 2025 09:54:41 -0400
From: Leo Li <sunpeng.li@....com>
To: Thadeu Lima de Souza Cascardo <cascardo@...lia.com>, Alex Deucher
	<alexdeucher@...il.com>
CC: Brian Geffon <bgeffon@...gle.com>, "Wentland, Harry"
	<Harry.Wentland@....com>, Alex Deucher <alexander.deucher@....com>,
	<christian.koenig@....com>, David Airlie <airlied@...il.com>, Simona Vetter
	<simona@...ll.ch>, Tvrtko Ursulin <tvrtko.ursulin@...lia.com>, Yunxiang Li
	<Yunxiang.Li@....com>, Lijo Lazar <lijo.lazar@....com>, Prike Liang
	<Prike.Liang@....com>, Pratap Nirujogi <pratap.nirujogi@....com>, "Luben
 Tuikov" <luben.tuikov@....com>, <amd-gfx@...ts.freedesktop.org>,
	<dri-devel@...ts.freedesktop.org>, <linux-kernel@...r.kernel.org>, "Garrick
 Evans" <garrick@...gle.com>, <stable@...r.kernel.org>
Subject: Re: [PATCH] drm/amdgpu: Raven: don't allow mixing GTT and VRAM



On 2025-07-22 07:21, Thadeu Lima de Souza Cascardo wrote:
> On Fri, Jul 18, 2025 at 07:00:39PM -0400, Alex Deucher wrote:
>> On Fri, Jul 18, 2025 at 6:01 PM Leo Li <sunpeng.li@....com> wrote:
>>>
>>>
>>>
>>> On 2025-07-18 17:33, Alex Deucher wrote:
>>>> On Fri, Jul 18, 2025 at 5:02 PM Leo Li <sunpeng.li@....com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 2025-07-18 16:07, Alex Deucher wrote:
>>>>>> On Fri, Jul 18, 2025 at 1:57 PM Brian Geffon <bgeffon@...gle.com> wrote:
>>>>>>>
>>>>>>> On Thu, Jul 17, 2025 at 10:59 AM Alex Deucher <alexdeucher@...il.com> wrote:
>>>>>>>>
>>>>>>>> On Wed, Jul 16, 2025 at 8:13 PM Brian Geffon <bgeffon@...gle.com> wrote:
>>>>>>>>>
>>>>>>>>> On Wed, Jul 16, 2025 at 5:03 PM Alex Deucher <alexdeucher@...il.com> wrote:
>>>>>>>>>>
>>>>>>>>>> On Wed, Jul 16, 2025 at 12:40 PM Brian Geffon <bgeffon@...gle.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jul 16, 2025 at 12:33 PM Alex Deucher <alexdeucher@...il.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jul 16, 2025 at 12:18 PM Brian Geffon <bgeffon@...gle.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Commit 81d0bcf99009 ("drm/amdgpu: make display pinning more flexible (v2)")
>>>>>>>>>>>>> allowed for newer ASICs to mix GTT and VRAM, this change also noted that
>>>>>>>>>>>>> some older boards, such as Stoney and Carrizo do not support this.
>>>>>>>>>>>>> It appears that at least one additional ASIC does not support this which
>>>>>>>>>>>>> is Raven.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We observed this issue when migrating a device from a 5.4 to 6.6 kernel
>>>>>>>>>>>>> and have confirmed that Raven also needs to be excluded from mixing GTT
>>>>>>>>>>>>> and VRAM.
>>>>>>>>>>>>
>>>>>>>>>>>> Can you elaborate a bit on what the problem is?  For carrizo and
>>>>>>>>>>>> stoney this is a hardware limitation (all display buffers need to be
>>>>>>>>>>>> in GTT or VRAM, but not both).  Raven and newer don't have this
>>>>>>>>>>>> limitation and we tested raven pretty extensively at the time.s
>>>>>>>>>>>
>>>>>>>>>>> Thanks for taking the time to look. We have automated testing and a
>>>>>>>>>>> few igt gpu tools tests failed and after debugging we found that
>>>>>>>>>>> commit 81d0bcf99009 is what introduced the failures on this hardware
>>>>>>>>>>> on 6.1+ kernels. The specific tests that fail are kms_async_flips and
>>>>>>>>>>> kms_plane_alpha_blend, excluding Raven from this sharing of GTT and
>>>>>>>>>>> VRAM buffers resolves the issue.
>>>>>>>>>>
>>>>>>>>>> + Harry and Leo
>>>>>>>>>>
>>>>>>>>>> This sounds like the memory placement issue we discussed last week.
>>>>>>>>>> In that case, the issue is related to where the buffer ends up when we
>>>>>>>>>> try to do an async flip.  In that case, we can't do an async flip
>>>>>>>>>> without a full modeset if the buffers locations are different than the
>>>>>>>>>> last modeset because we need to update more than just the buffer base
>>>>>>>>>> addresses.  This change works around that limitation by always forcing
>>>>>>>>>> display buffers into VRAM or GTT.  Adding raven to this case may fix
>>>>>>>>>> those tests but will make the overall experience worse because we'll
>>>>>>>>>> end up effectively not being able to not fully utilize both gtt and
>>>>>>>>>> vram for display which would reintroduce all of the problems fixed by
>>>>>>>>>> 81d0bcf99009 ("drm/amdgpu: make display pinning more flexible (v2)").
>>>>>>>>>
>>>>>>>>> Thanks Alex, the thing is, we only observe this on Raven boards, why
>>>>>>>>> would Raven only be impacted by this? It would seem that all devices
>>>>>>>>> would have this issue, no? Also, I'm not familiar with how
>>>>>>>>
>>>>>>>> It depends on memory pressure and available memory in each pool.
>>>>>>>> E.g., initially the display buffer is in VRAM when the initial mode
>>>>>>>> set happens.  The watermarks, etc. are set for that scenario.  One of
>>>>>>>> the next frames ends up in a pool different than the original.  Now
>>>>>>>> the buffer is in GTT.  The async flip interface does a fast validation
>>>>>>>> to try and flip as soon as possible, but that validation fails because
>>>>>>>> the watermarks need to be updated which requires a full modeset.
>>>>>
>>>>> Huh, I'm not sure if this actually is an issue for APUs. The fix that introduced
>>>>> a check for same memory placement on async flips was on a system with a DGPU,
>>>>> for which VRAM placement does matter:
>>>>> https://github.com/torvalds/linux/commit/a7c0cad0dc060bb77e9c9d235d68441b0fc69507
>>>>>
>>>>> Looking around in DM/DML, for APUs, I don't see any logic that changes DCN
>>>>> bandwidth validation depending on memory placement. There's a gpuvm_enable flag
>>>>> for SG, but it's statically set to 1 on APU DCN versions. It sounds like for
>>>>> APUs specifically, we *should* be able to ignore the mem placement check. I can
>>>>> spin up a patch to test this out.
>>>>
>>>> Is the gpu_vm_support flag ever set for dGPUs?  The allowed domains
>>>> for display buffers are determined by
>>>> amdgpu_display_supported_domains() and we only allow GTT as a domain
>>>> if gpu_vm_support is set, which I think is just for APUs.  In that
>>>> case, we could probably only need the checks specifically for
>>>> CHIP_CARRIZO and CHIP_STONEY since IIRC, they don't support mixed VRAM
>>>> and GTT (only one or the other?).  dGPUs and really old APUs will
>>>> always get VRAM, and newer APUs will get VRAM | GTT.
>>>
>>> It doesn't look like gpu_vm_support is set for DGPUs
>>> https://elixir.bootlin.com/linux/v6.15.6/source/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c#L1866
>>>
>>> Though interestingly, further up at #L1858, Raven has gpu_vm_support = 0. Maybe it had stability issues?
>>> https://github.com/torvalds/linux/commit/098c13079c6fdd44f10586b69132c392ebf87450
>>
>> We need to be a little careful here asic_type == CHIP_RAVEN covers
>> several variants:
>> apu_flags & AMD_APU_IS_RAVEN - raven1 (gpu_vm_support = false)
>> apu_flags & AMD_APU_IS_RAVEN2 - raven2 (gpu_vm_support = true)
>> apu_flags & AMD_APU_IS_PICASSO - picasso (gpu_vm_support = true)
>>
>> amdgpu_display_supported_domains() only sets AMDGPU_GEM_DOMAIN_GTT if
>> gpu_vm_support is true.  so we'd never get into the check in
>> amdgpu_bo_get_preferred_domain() for raven1.
>>
>> Anyway, back to your suggestion, I think we can probably drop the
>> checks as you should always get a compatible memory buffer due to
>> amdgpu_bo_get_preferred_domain(). Pinning should fail if we can't pin
>> in the required domain.  amdgpu_display_supported_domains() will
>> ensure you always get VRAM or GTT or VRAM | GTT depending on what the
>> chip supports.  Then amdgpu_bo_get_preferred_domain() will either
>> leave that as is, or force VRAM or GTT for the STONEY/CARRIZO case.
>> On the off chance we do get incompatible memory, something like the
>> attached patch should do the trick.

Thanks for the patch, this makes sense to me.

Somewhat unrelated: I wonder if setting AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS is necessary before
bo_pin(). FWIU from chatting with our DCN experts, DCN doesn't really care if the fb is
contiguous or not.

Which begs the question -- what exactly does AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS mean? From git
history, it seems setting this flag doesn't necessarily move the bo to be congiguous. But
rather:

    When we set AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS
    - This means contiguous is not mandatory.
    - we will try to allocate the contiguous buffer. Say if the
      allocation fails, we fallback to allocate the individual pages.

https://github.com/torvalds/linux/commit/e362b7c8f8c7af00d06f0ab609629101aebae993

Does that mean -- if the buffer is already in the required domain -- that bo_pin() will also
attempt to make it contiguous? Or will it just pin it from being moved and leave it at that?

I guess in any case, it sounds like VRAM_CONTIGUOUS is not necessary for DCN scanout.
I can give dropping it a spin and see if IGT complains.

Thanks,
Leo 

>>
>> Alex
>>
> 
> Thanks for the patch, Alex.
> 
> I have tested it, and though kms_async_flips and kms_plane_alpha_blend
> pass, kms_plane_cursor still fail.
> 
> I am going to investigate a little more today and send more details from my
> findings.
> 
> Thanks.
> Cascardo.
> 
>>
>>>
>>> - Leo
>>>
>>>>
>>>> Alex
>>>>
>>>>>
>>>>> Thanks,
>>>>> Leo
>>>>>
>>>>>>>>
>>>>>>>> It's tricky to fix because you don't want to use the worst case
>>>>>>>> watermarks all the time because that will limit the number available
>>>>>>>> display options and you don't want to force everything to a particular
>>>>>>>> memory pool because that will limit the amount of memory that can be
>>>>>>>> used for display (which is what the patch in question fixed).  Ideally
>>>>>>>> the caller would do a test commit before the page flip to determine
>>>>>>>> whether or not it would succeed before issuing it and then we'd have
>>>>>>>> some feedback mechanism to tell the caller that the commit would fail
>>>>>>>> due to buffer placement so it would do a full modeset instead.  We
>>>>>>>> discussed this feedback mechanism last week at the display hackfest.
>>>>>>>>
>>>>>>>>
>>>>>>>>> kms_plane_alpha_blend works, but does this also support that test
>>>>>>>>> failing as the cause?
>>>>>>>>
>>>>>>>> That may be related.  I'm not too familiar with that test either, but
>>>>>>>> Leo or Harry can provide some guidance.
>>>>>>>>
>>>>>>>> Alex
>>>>>>>
>>>>>>> Thanks everyone for the input so far. I have a question for the
>>>>>>> maintainers, given that it seems that this is functionally broken for
>>>>>>> ASICs which are iGPUs, and there does not seem to be an easy fix, does
>>>>>>> it make sense to extend this proposed patch to all iGPUs until a more
>>>>>>> permanent fix can be identified? At the end of the day I'll take
>>>>>>> functional correctness over performance.
>>>>>>
>>>>>> It's not functional correctness, it's usability.  All that is
>>>>>> potentially broken is async flips (which depend on memory pressure and
>>>>>> buffer placement), while if you effectively revert the patch, you end
>>>>>> up  limiting all display buffers to either VRAM or GTT which may end
>>>>>> up causing the inability to display anything because there is not
>>>>>> enough memory in that pool for the next modeset.  We'll start getting
>>>>>> bug reports about blank screens and failure to set modes because of
>>>>>> memory pressure.  I think if we want a short term fix, it would be to
>>>>>> always set the worst case watermarks.  The downside to that is that it
>>>>>> would possibly cause some working display setups to stop working if
>>>>>> they were on the margins to begin with.
>>>>>>
>>>>>> Alex
>>>>>>
>>>>>>>
>>>>>>> Brian
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks again,
>>>>>>>>> Brian
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Alex
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Brian
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Alex
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Fixes: 81d0bcf99009 ("drm/amdgpu: make display pinning more flexible (v2)")
>>>>>>>>>>>>> Cc: Luben Tuikov <luben.tuikov@....com>
>>>>>>>>>>>>> Cc: Christian König <christian.koenig@....com>
>>>>>>>>>>>>> Cc: Alex Deucher <alexander.deucher@....com>
>>>>>>>>>>>>> Cc: stable@...r.kernel.org # 6.1+
>>>>>>>>>>>>> Tested-by: Thadeu Lima de Souza Cascardo <cascardo@...lia.com>
>>>>>>>>>>>>> Signed-off-by: Brian Geffon <bgeffon@...gle.com>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
>>>>>>>>>>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>>>>>>>>>> index 73403744331a..5d7f13e25b7c 100644
>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>>>>>>>>>> @@ -1545,7 +1545,8 @@ uint32_t amdgpu_bo_get_preferred_domain(struct amdgpu_device *adev,
>>>>>>>>>>>>>                                             uint32_t domain)
>>>>>>>>>>>>>  {
>>>>>>>>>>>>>         if ((domain == (AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT)) &&
>>>>>>>>>>>>> -           ((adev->asic_type == CHIP_CARRIZO) || (adev->asic_type == CHIP_STONEY))) {
>>>>>>>>>>>>> +           ((adev->asic_type == CHIP_CARRIZO) || (adev->asic_type == CHIP_STONEY) ||
>>>>>>>>>>>>> +            (adev->asic_type == CHIP_RAVEN))) {
>>>>>>>>>>>>>                 domain = AMDGPU_GEM_DOMAIN_VRAM;
>>>>>>>>>>>>>                 if (adev->gmc.real_vram_size <= AMDGPU_SG_THRESHOLD)
>>>>>>>>>>>>>                         domain = AMDGPU_GEM_DOMAIN_GTT;
>>>>>>>>>>>>> --
>>>>>>>>>>>>> 2.50.0.727.gbf7dc18ff4-goog
>>>>>>>>>>>>>
>>>>>
>>>
> 
>> From cce1652c62c42c858de64c306ea0ddc7af3bd0b1 Mon Sep 17 00:00:00 2001
>> From: Alex Deucher <alexander.deucher@....com>
>> Date: Fri, 18 Jul 2025 18:40:26 -0400
>> Subject: [PATCH] drm/amd/display: refine framebuffer placement checks
>>
>> When we commit planes, we need to make sure the
>> framebuffer memory locations are compatible. Various
>> hardware has the following requirements for display buffers:
>> dGPUs, old APUs, raven1 - must be in VRAM
>> cazziro/stoney - must be in VRAM or GTT, but not both
>> newer APUs (raven2/picasso and newer) - can be in VRAM or GTT
>>
>> You should always get a compatible memory buffer due to
>> amdgpu_bo_get_preferred_domain(). amdgpu_display_supported_domains()
>> will ensure you always get VRAM or GTT or VRAM | GTT depending on
>> what the chip supports.  Then amdgpu_bo_get_preferred_domain()
>> will either leave that as is when pinning, or force VRAM or GTT
>> for the STONEY/CARRIZO case.
>>
>> As such the checks could probably be removed, but on the off chance
>> we do end up getting different memory pool for the old
>> and new framebuffers, refine the check to take into account the
>> hardware capabilities.
>>
>> Fixes: a7c0cad0dc06 ("drm/amd/display: ensure async flips are only accepted for fast updates")
>> Reported-by: Brian Geffon <bgeffon@...gle.com>
>> Cc: Leo Li <sunpeng.li@....com>
>> Signed-off-by: Alex Deucher <alexander.deucher@....com>
>> ---
>>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 20 ++++++++++++++++---
>>  1 file changed, 17 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> index 129476b6d5fa9..de2bd789ec15b 100644
>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> @@ -9288,6 +9288,18 @@ static void amdgpu_dm_enable_self_refresh(struct amdgpu_crtc *acrtc_attach,
>>  	}
>>  }
>>  
>> +static bool amdgpu_dm_mem_type_compatible(struct amdgpu_device *adev,
>> +					  struct drm_framebuffer *old_fb,
>> +					  struct drm_framebuffer *new_fb)
>> +{
>> +	if (!adev->mode_info.gpu_vm_support ||
>> +	    (adev->asic_type == CHIP_CARRIZO) ||
>> +	    (adev->asic_type == CHIP_STONEY))
>> +		return get_mem_type(old_fb) == get_mem_type(new_fb);
>> +
>> +	return true;
>> +}
>> +
>>  static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
>>  				    struct drm_device *dev,
>>  				    struct amdgpu_display_manager *dm,
>> @@ -9465,7 +9477,7 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
>>  		 */
>>  		if (crtc->state->async_flip &&
>>  		    (acrtc_state->update_type != UPDATE_TYPE_FAST ||
>> -		     get_mem_type(old_plane_state->fb) != get_mem_type(fb)))
>> +		     !amdgpu_dm_mem_type_compatible(dm->adev, old_plane_state->fb, fb)))
>>  			drm_warn_once(state->dev,
>>  				      "[PLANE:%d:%s] async flip with non-fast update\n",
>>  				      plane->base.id, plane->name);
>> @@ -9473,7 +9485,7 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
>>  		bundle->flip_addrs[planes_count].flip_immediate =
>>  			crtc->state->async_flip &&
>>  			acrtc_state->update_type == UPDATE_TYPE_FAST &&
>> -			get_mem_type(old_plane_state->fb) == get_mem_type(fb);
>> +			amdgpu_dm_mem_type_compatible(dm->adev, old_plane_state->fb, fb);
>>  
>>  		timestamp_ns = ktime_get_ns();
>>  		bundle->flip_addrs[planes_count].flip_timestamp_in_us = div_u64(timestamp_ns, 1000);
>> @@ -11760,6 +11772,7 @@ static bool amdgpu_dm_crtc_mem_type_changed(struct drm_device *dev,
>>  					    struct drm_atomic_state *state,
>>  					    struct drm_crtc_state *crtc_state)
>>  {
>> +	struct amdgpu_device *adev = drm_to_adev(dev);
>>  	struct drm_plane *plane;
>>  	struct drm_plane_state *new_plane_state, *old_plane_state;
>>  
>> @@ -11773,7 +11786,8 @@ static bool amdgpu_dm_crtc_mem_type_changed(struct drm_device *dev,
>>  		}
>>  
>>  		if (old_plane_state->fb && new_plane_state->fb &&
>> -		    get_mem_type(old_plane_state->fb) != get_mem_type(new_plane_state->fb))
>> +		    !amdgpu_dm_mem_type_compatible(adev, old_plane_state->fb,
>> +						   new_plane_state->fb))
>>  			return true;
>>  	}
>>  
>> -- 
>> 2.50.1
>>
>