[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6f7e5632-9080-f032-269f-06c6d96f4d88@amd.com>
Date: Mon, 27 Mar 2023 21:53:39 +0200
From: Christian König <christian.koenig@....com>
To: John Ogness <john.ogness@...utronix.de>,
Tanmay Bhushan <007047221b@...il.com>,
Ben Skeggs <bskeggs@...hat.com>,
Karol Herbst <kherbst@...hat.com>,
Lyude Paul <lyude@...hat.com>,
David Airlie <airlied@...il.com>,
Daniel Vetter <daniel@...ll.ch>,
dri-devel@...ts.freedesktop.org, nouveau@...ts.freedesktop.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] drm/nouveau: Fix bug in buffer relocs for Nouveau
Am 27.03.23 um 10:42 schrieb John Ogness:
> On 2023-01-19, Tanmay Bhushan <007047221b@...il.com> wrote:
>> dma_resv_wait_timeout returns greater than zero on success
>> as opposed to ttm_bo_wait_ctx. As a result of that relocs
>> will fail and give failure even when it was a success.
> Today I switched my workstation from 6.2 to 6.3-rc3 and started seeing
> lots of new kernel messages:
>
> [ 642.138313][ T1751] nouveau 0000:f0:10.0: X[1751]: reloc wait_idle failed: 1500
> [ 642.138389][ T1751] nouveau 0000:f0:10.0: X[1751]: reloc apply: 1500
> [ 646.123490][ T1751] nouveau 0000:f0:10.0: X[1751]: reloc wait_idle failed: 1500
> [ 646.123573][ T1751] nouveau 0000:f0:10.0: X[1751]: reloc apply: 1500
>
> The graphics seemed to go slower or hang a bit when these messages would
> appear. I then found your patch! However, I have some comments about it.
>
> First, it should include a fixes tag:
>
> Fixes: 41d351f29528 ("drm/nouveau: stop using ttm_bo_wait")
>
>> Signed-off-by: Tanmay Bhushan <007047221b@...il.com>
>> ---
>> drivers/gpu/drm/nouveau/nouveau_gem.c | 3 +--
>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
>> index f77e44958037..0e3690459144 100644
>> --- a/drivers/gpu/drm/nouveau/nouveau_gem.c
>> +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
>> @@ -706,9 +706,8 @@ nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli,
>> ret = dma_resv_wait_timeout(nvbo->bo.base.resv,
>> DMA_RESV_USAGE_BOOKKEEP,
>> false, 15 * HZ);
>> - if (ret == 0)
>> + if (ret <= 0) {
>> ret = -EBUSY;
> This is incorrect for 2 reasons:
>
> * it treats restarts as timeouts
>
> * this function now returns >0 on success
>
>> - if (ret) {
>> NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n",
>> ret);
>> break;
> I rearranged things to basically correctly translate the return code of
> dma_resv_wait_timeout() to match the previous ttm_bo_wait():
>
> ret = dma_resv_wait_timeout(nvbo->bo.base.resv,
> DMA_RESV_USAGE_BOOKKEEP,
> false, 15 * HZ);
> if (ret == 0)
> ret = -EBUSY;
> if (ret > 0)
> ret = 0;
> if (ret) {
> NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n",
> ret);
> break;
> }
>
> So the patch just becomes:
>
> @@ -708,6 +708,8 @@ nouveau_gem_pushbuf_reloc_apply(struct n
> false, 15 * HZ);
> if (ret == 0)
> ret = -EBUSY;
> + if (ret > 0)
> + ret = 0;
> if (ret) {
> NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n",
> ret);
>
> With this variant, everything runs correctly on my workstation again.
>
> It probably deserves a comment about why @ret is being translated. Or
> perhaps a new variable should be introduced to separate the return value
> of dma_resv_wait_timeout() from the return value of this function.
I'm going to take a look tomorrow, but your code already looks pretty
correct to me.
And sorry for the noise, missed the different in the conversion.
Thanks,
Christian.
>
> Either way, this is an important fix for 6.3-rc!
>
> John Ogness
Powered by blists - more mailing lists