linux-kernel - Re: [PATCH] drm/msm/adreno: De-spaghettify the use of memory barriers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240618163823.t3hqxyqeopkilxej@hu-akhilpo-hyd.qualcomm.com>
Date: Tue, 18 Jun 2024 22:08:23 +0530
From: Akhil P Oommen <quic_akhilpo@...cinc.com>
To: Konrad Dybcio <konrad.dybcio@...aro.org>
CC: Rob Clark <robdclark@...il.com>, Sean Paul <sean@...rly.run>,
        "Abhinav
 Kumar" <quic_abhinavk@...cinc.com>,
        Dmitry Baryshkov
	<dmitry.baryshkov@...aro.org>,
        Marijn Suijten
	<marijn.suijten@...ainline.org>,
        David Airlie <airlied@...il.com>, "Daniel
 Vetter" <daniel@...ll.ch>,
        Rob Clark <robdclark@...omium.org>, <linux-arm-msm@...r.kernel.org>,
        <dri-devel@...ts.freedesktop.org>, <freedreno@...ts.freedesktop.org>,
        <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] drm/msm/adreno: De-spaghettify the use of memory barriers

On Tue, Jun 04, 2024 at 07:35:04PM +0200, Konrad Dybcio wrote:
> 
> 
> On 5/14/24 20:38, Akhil P Oommen wrote:
> > On Wed, May 08, 2024 at 07:46:31PM +0200, Konrad Dybcio wrote:
> > > Memory barriers help ensure instruction ordering, NOT time and order
> > > of actual write arrival at other observers (e.g. memory-mapped IP).
> > > On architectures employing weak memory ordering, the latter can be a
> > > giant pain point, and it has been as part of this driver.
> > > 
> > > Moreover, the gpu_/gmu_ accessors already use non-relaxed versions of
> > > readl/writel, which include r/w (respectively) barriers.
> > > 
> > > Replace the barriers with a readback that ensures the previous writes
> > > have exited the write buffer (as the CPU must flush the write to the
> > > register it's trying to read back) and subsequently remove the hack
> > > introduced in commit b77532803d11 ("drm/msm/a6xx: Poll for GBIF unhalt
> > > status in hw_init").
> > > 
> > > Fixes: b77532803d11 ("drm/msm/a6xx: Poll for GBIF unhalt status in hw_init")
> > > Signed-off-by: Konrad Dybcio <konrad.dybcio@...aro.org>
> > > ---
> > >   drivers/gpu/drm/msm/adreno/a6xx_gmu.c |  5 ++---
> > >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 14 ++++----------
> > >   2 files changed, 6 insertions(+), 13 deletions(-)
> > 
> > I prefer this version compared to the v2. A helper routine is
> > unnecessary here because:
> > 1. there are very few scenarios where we have to read back the same
> > register.
> > 2. we may accidently readback a write only register.
> 
> Which would still trigger an address dependency on the CPU, no?

Yes, but it is not a good idea to read a write-only register. We can't be
sure about its effect on the endpoint.

> 
> > 
> > > 
> > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > index 0e3dfd4c2bc8..4135a53b55a7 100644
> > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > > @@ -466,9 +466,8 @@ static int a6xx_rpmh_start(struct a6xx_gmu *gmu)
> > >   	int ret;
> > >   	u32 val;
> > > -	gmu_write(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ, 1 << 1);
> > > -	/* Wait for the register to finish posting */
> > > -	wmb();
> > > +	gmu_write(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ, BIT(1));
> > > +	gmu_read(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ);
> > 
> > This is unnecessary because we are polling on a register on the same port below. But I think we
> > can replace "wmb()" above with "mb()" to avoid reordering between read
> > and write IO instructions.
> 
> Ok on the dropping readback part
> 
> + AFAIU from Will's response, we can drop the barrier as well

Lets wait a bit on Will's response on compiler reordering.

> 
> > 
> > >   	ret = gmu_poll_timeout(gmu, REG_A6XX_GMU_RSCC_CONTROL_ACK, val,
> > >   		val & (1 << 1), 100, 10000);
> > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > index 973872ad0474..0acbc38b8e70 100644
> > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > @@ -1713,22 +1713,16 @@ static int hw_init(struct msm_gpu *gpu)
> > >   	}
> > >   	/* Clear GBIF halt in case GX domain was not collapsed */
> > > +	gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
> > 
> > We need a full barrier here to avoid reordering. Also, lets add a
> > comment about why we are doing this odd looking sequence.
> > 
> > > +	gpu_read(gpu, REG_A6XX_GBIF_HALT);
> > >   	if (adreno_is_a619_holi(adreno_gpu)) {
> > > -		gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
> > >   		gpu_write(gpu, REG_A6XX_RBBM_GPR0_CNTL, 0);
> > > -		/* Let's make extra sure that the GPU can access the memory.. */
> > > -		mb();
> > 
> > We need a full barrier here.
> > 
> > > +		gpu_read(gpu, REG_A6XX_RBBM_GPR0_CNTL);
> > >   	} else if (a6xx_has_gbif(adreno_gpu)) {
> > > -		gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
> > >   		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
> > > -		/* Let's make extra sure that the GPU can access the memory.. */
> > > -		mb();
> > 
> > We need a full barrier here.
> 
> Not sure we do between REG_A6XX_GBIF_HALT & REG_A6XX_RBBM_(GBIF_HALT/GPR0_CNTL),
> but I suppose keeping the one after REG_A6XX_RBBM_(GBIF_HALT/GPR0_CNTL) makes
> sense to avoid the possibility of configuring the GPU before it can access DRAM..

Techinically, I think we don't need a barrier or the below read back.
Because the above write is ordered with the write (on CP_CNTL reg) which
finally triggers CP INIT later. GPU won't access memory before CP INIT.

> 
> > 
> > > +		gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT);
> > >   	}
> > > -	/* Some GPUs are stubborn and take their sweet time to unhalt GBIF! */
> > > -	if (adreno_is_a7xx(adreno_gpu) && a6xx_has_gbif(adreno_gpu))
> > > -		spin_until(!gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK));
> > > -
> > 
> > Why is this removed?
> 
> Because it was a hack in the first place and the enforcement of GBIF
> unhalt requests coming through before proceeding further removes the
> necessity to check this (unless there's some hw-mandated delay we should
> keep in mind, but kgsl doesn't have that and there doesn't seem to be
> any from testing on 8[456]50).

Oh! I just saw the history. There is no ack for 'unhalt' in hw.
Anyway this chunk is an unrelated change. Should be a separate change,
no?

-Akhil.

> 
> Konrad