lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1400491331.8467.8.camel@weser.hi.pengutronix.de>
Date:	Mon, 19 May 2014 11:22:11 +0200
From:	Lucas Stach <l.stach@...gutronix.de>
To:	Thierry Reding <thierry.reding@...il.com>
Cc:	Alexandre Courbot <acourbot@...dia.com>, gnurou@...il.com,
	nouveau@...ts.freedesktop.org, linux-kernel@...r.kernel.org,
	dri-devel@...ts.freedesktop.org, Ben Skeggs <bskeggs@...hat.com>,
	linux-tegra@...r.kernel.org
Subject: Re: [PATCH 4/4] drm/nouveau: introduce CPU cache flushing macro

Am Montag, den 19.05.2014, 11:02 +0200 schrieb Thierry Reding:
> On Mon, May 19, 2014 at 04:10:58PM +0900, Alexandre Courbot wrote:
> > Some architectures (e.g. ARM) need the CPU buffers to be explicitely
> > flushed for a memory write to take effect. Not doing so results in
> > synchronization issues, especially after writing to BOs.
> 
> It seems to me that the above is generally true for all architectures,
> not just ARM.
> 
No, on PCI coherent arches, like x86 and some PowerPCs, the GPU will
snoop the CPU caches and therefore an explicit cache flush is not
required.

> Also: s/explicitely/explicitly/
> 
> > This patch introduces a macro that flushes the caches on ARM and
> > translates to a no-op on other architectures, and uses it when
> > writing to in-memory BOs. It will also be useful for implementations of
> > instmem that access shared memory directly instead of going through
> > PRAMIN.
> 
> Presumably instmem can access shared memory on all architectures, so
> this doesn't seem like a property of the architecture but rather of the
> memory pool backing the instmem.
> 
> In that case I wonder if this shouldn't be moved into an operation that
> is implemented by the backing memory pool and be a noop where the cache
> doesn't need explicit flushing.
> 
> > diff --git a/drivers/gpu/drm/nouveau/core/os.h b/drivers/gpu/drm/nouveau/core/os.h
> > index d0ced94ca54c..274b4460bb03 100644
> > --- a/drivers/gpu/drm/nouveau/core/os.h
> > +++ b/drivers/gpu/drm/nouveau/core/os.h
> > @@ -38,4 +38,21 @@
> >  #endif /* def __BIG_ENDIAN else */
> >  #endif /* !ioread32_native */
> >  
> > +#if defined(__arm__)
> > +
> > +#define nv_cpu_cache_flush_area(va, size)	\
> > +do {						\
> > +	phys_addr_t pa = virt_to_phys(va);	\
> > +	__cpuc_flush_dcache_area(va, size);	\
> > +	outer_flush_range(pa, pa + size);	\
> > +} while (0)
> 
> Couldn't this be a static inline function?
> 
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
> [...]
> > index 0886f47e5244..b9c9729c5733 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
> > +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
> > @@ -437,8 +437,10 @@ nouveau_bo_wr16(struct nouveau_bo *nvbo, unsigned index, u16 val)
> >  	mem = &mem[index];
> >  	if (is_iomem)
> >  		iowrite16_native(val, (void __force __iomem *)mem);
> > -	else
> > +	else {
> >  		*mem = val;
> > +		nv_cpu_cache_flush_area(mem, 2);
> > +	}
> >  }
> >  
> >  u32
> > @@ -461,8 +463,10 @@ nouveau_bo_wr32(struct nouveau_bo *nvbo, unsigned index, u32 val)
> >  	mem = &mem[index];
> >  	if (is_iomem)
> >  		iowrite32_native(val, (void __force __iomem *)mem);
> > -	else
> > +	else {
> >  		*mem = val;
> > +		nv_cpu_cache_flush_area(mem, 4);
> > +	}
> 
> This looks rather like a sledgehammer to me. Effectively this turns nvbo
> into an uncached buffer. With additional overhead of constantly flushing
> caches. Wouldn't it make more sense to locate the places where these are
> called and flush the cache after all the writes have completed?
> 
I don't think the explicit flushing for those things makes sense. I
think it is a lot more effective to just map the BOs write-combined on
PCI non-coherent arches. This way any writes will be buffered. Reads
will be slow, but I don't think nouveau is reading back a lot from those
buffers.
Using the write-combining buffer doesn't need any additional
synchronization as it will get flushed on pushbuf kickoff anyways.

Regards,
Lucas
-- 
Pengutronix e.K.             | Lucas Stach                 |
Industrial Linux Solutions   | http://www.pengutronix.de/  |

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ