lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87r0taa8l3.fsf@jogness.linutronix.de>
Date:   Mon, 27 Mar 2023 10:48:48 +0206
From:   John Ogness <john.ogness@...utronix.de>
To:     Tanmay Bhushan <007047221b@...il.com>,
        Ben Skeggs <bskeggs@...hat.com>,
        Karol Herbst <kherbst@...hat.com>,
        Lyude Paul <lyude@...hat.com>,
        David Airlie <airlied@...il.com>,
        Daniel Vetter <daniel@...ll.ch>,
        dri-devel@...ts.freedesktop.org, nouveau@...ts.freedesktop.org,
        linux-kernel@...r.kernel.org,
        Christian König <christian.koenig@....com>
Cc:     Tanmay Bhushan <007047221b@...il.com>
Subject: Re: [PATCH] drm/nouveau: Fix bug in buffer relocs for Nouveau

On 2023-01-19, Tanmay Bhushan <007047221b@...il.com> wrote:
> dma_resv_wait_timeout returns greater than zero on success
> as opposed to ttm_bo_wait_ctx. As a result of that relocs
> will fail and give failure even when it was a success.

Today I switched my workstation from 6.2 to 6.3-rc3 and started seeing
lots of new kernel messages:

[  642.138313][ T1751] nouveau 0000:f0:10.0: X[1751]: reloc wait_idle failed: 1500
[  642.138389][ T1751] nouveau 0000:f0:10.0: X[1751]: reloc apply: 1500
[  646.123490][ T1751] nouveau 0000:f0:10.0: X[1751]: reloc wait_idle failed: 1500
[  646.123573][ T1751] nouveau 0000:f0:10.0: X[1751]: reloc apply: 1500

The graphics seemed to go slower or hang a bit when these messages would
appear. I then found your patch! However, I have some comments about it.

First, it should include a fixes tag:

Fixes: 41d351f29528 ("drm/nouveau: stop using ttm_bo_wait")

> Signed-off-by: Tanmay Bhushan <007047221b@...il.com>
> ---
>  drivers/gpu/drm/nouveau/nouveau_gem.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
> index f77e44958037..0e3690459144 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_gem.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
> @@ -706,9 +706,8 @@ nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli,
>  		ret = dma_resv_wait_timeout(nvbo->bo.base.resv,
>  					    DMA_RESV_USAGE_BOOKKEEP,
>  					    false, 15 * HZ);
> -		if (ret == 0)
> +		if (ret <= 0) {
>  			ret = -EBUSY;

This is incorrect for 2 reasons:

* it treats restarts as timeouts

* this function now returns >0 on success

> -		if (ret) {
>  			NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n",
>  				  ret);
>  			break;

I rearranged things to basically correctly translate the return code of
dma_resv_wait_timeout() to match the previous ttm_bo_wait():

		ret = dma_resv_wait_timeout(nvbo->bo.base.resv,
					    DMA_RESV_USAGE_BOOKKEEP,
					    false, 15 * HZ);
		if (ret == 0)
			ret = -EBUSY;
		if (ret > 0)
			ret = 0;
		if (ret) {
			NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n",
				  ret);
			break;
		}

So the patch just becomes:

@@ -708,6 +708,8 @@ nouveau_gem_pushbuf_reloc_apply(struct n
 					    false, 15 * HZ);
 		if (ret == 0)
 			ret = -EBUSY;
+		if (ret > 0)
+			ret = 0;
 		if (ret) {
 			NV_PRINTK(err, cli, "reloc wait_idle failed: %ld\n",
 				  ret);

With this variant, everything runs correctly on my workstation again.

It probably deserves a comment about why @ret is being translated. Or
perhaps a new variable should be introduced to separate the return value
of dma_resv_wait_timeout() from the return value of this function.

Either way, this is an important fix for 6.3-rc!

John Ogness

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ