lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 11 Nov 2009 21:49:15 +0000
From:	Vincent Sanders <vince@...tec.co.uk>
To:	Krzysztof Helt <krzysztof.h1@...pl>
Cc:	akpm@...ux-foundation.org, lethal@...ux-sh.org,
	dilinger@...ian.org, linux-kernel@...r.kernel.org,
	Simtec Linux Team <linux@...tec.co.uk>
Subject: Re: SM501: Implement acceleration features

On Wed, Nov 11, 2009 at 12:58:25AM +0100, Krzysztof Helt wrote:
> On Tue, 10 Nov 2009 17:18:10 +0000
> Ben Dooks <ben@...tec.co.uk> wrote:
> 
> > From: Vincent Sanders <vince@...tec.co.uk>
> > 
> > This patch provides the acceleration entry points for the SM501
> > framebuffer driver.
> > 
> > This patch provides the sync, copyarea and fillrect entry points,
> > using the SM501's 2D acceleration engine to perform the operations
> > in-chip rather than across the bus.
> > 
> > Signed-off-by: Vincent Sanders <vince@...tec.co.uk>
> > Signed-off-by: Simtec Linux Team <linux@...tec.co.uk>
> > Signed-off-by: Ben Dooks <ben@...tec.co.uk>
> > 
> > ---
> >  drivers/video/sm501fb.c    |  238 ++++++++++++++++++++++++++++++++++++++++++---
> >  include/linux/sm501-regs.h |    2 
> >  2 files changed, 226 insertions(+), 14 deletions(-)
> > 
> > Index: b/drivers/video/sm501fb.c
> > ===================================================================
> > --- a/drivers/video/sm501fb.c	2009-11-03 11:19:44.000000000 +0000
> > +++ b/drivers/video/sm501fb.c	2009-11-03 11:19:46.000000000 +0000
> 

I have snipped all but small amount for context

> > +	/* wait for the 2d engine to be ready */
> > +	while ((count > 0) &&
> > +	       (readl(fbi->regs + SM501_SYSTEM_CONTROL) &
> > +		SM501_SYSCTRL_2D_ENGINE_STATUS) != 0)
> > +		count--;
> > +
> 
> You may add cpu_relax() inside this loop.
> 

ok, would need to test this thoroughly as the SM501 has some...odd
behaviours (see later on).

> > +
> > +	if (sm501fb_sync(info))
> > +		return;
> > +
> 
> Please check if you need to wait for the blit engine before writting
> to any register.  Usually, the values in the bit engine registers
> are shadowed after the engine is started (ie. the blitting operation
> is started) and the next set of values can be written into the regs.

indeed, if it were sane I would agree, it isnt in all circumstances, see later

> 
> > +	}
> > +
> > +	/* 2d compare mask */
> > +	writel(0xffffffff, fbi->regs2d + SM501_2D_COLOR_COMPARE_MASK);
> > +
> > +	/* 2d mask */
> > +	writel(0xffffffff, fbi->regs2d + SM501_2D_MASK);
> 
> All writes above do not use any copyarea specific value. They should
> be done only once in the sm501fb_set_par() function. It is enough to
> initialize the blitting engine when a mode is set (xres/yres and bpp
> values are known then). The only exception is when these values are
> reset by every blit operation.

not reset, but possibly corrupted depending on bus attachment.

> 
> > +
> > +	/* source and destination x y */
> > +	writel((sx << 16) | sy, fbi->regs2d + SM501_2D_SOURCE);
> > +	writel((dx << 16) | dy, fbi->regs2d + SM501_2D_DESTINATION);
> > +
> > +	/* w/h */
> > +	writel((width << 16) | height, fbi->regs2d + SM501_2D_DIMENSION);
> > +
> > +	/* do area move */
> > +	writel(0x800000cc | rtl, fbi->regs2d + SM501_2D_CONTROL);
> > +}
> > +
> 
> The same comments apply for the fillrect() method.
> 

I will explain at the end

> >  
> >  static struct fb_ops sm501fb_ops_crt = {
> >  	.owner		= THIS_MODULE,
> > @@ -1256,9 +1424,10 @@ static struct fb_ops sm501fb_ops_crt = {
> >  	.fb_setcolreg	= sm501fb_setcolreg,
> >  	.fb_pan_display	= sm501fb_pan_crt,
> >  	.fb_cursor	= sm501fb_cursor,
> > -	.fb_fillrect	= cfb_fillrect,
> > -	.fb_copyarea	= cfb_copyarea,
> > +	.fb_fillrect	= sm501fb_fillrect,
> > +	.fb_copyarea	= sm501fb_copyarea,
> >  	.fb_imageblit	= cfb_imageblit,
> > +	.fb_sync	= sm501fb_sync,
> >  };
> >  
> 
> You may want to try if setting the info->flags bit FBINFO_READS_FAST
> gives you a faster scrolling on the fb device. This flags says that
> you prefer to use the copyarea function over the imageblit function
> for scrolling. The latter is not accelerated in this driver.

thanks! will try that.

To clarify the idiocy and nasty issues with the SM501/2 range (this
isnt for the weak stomached) :-(

The Silicon Motion SM501 has three revisions (A, B and C. Revision C
is confusingly renamed as SM502 but keeps the 501 identifiers) it
also has the unusual ability to be connected to the host both by PCI
and local 32bit parallel bus in several flavours, Also in both bus mastering
and non bus mastering modes.

Revisions prior to C had a number of subtle bugs related to their
external memory interfaces and the interaction with interrupts which
means posted writes and reads of the irq status register may not
actually match up with reality. There are additional constraints on
clock domains and internal bus configurations which we already address
elsewhere in the driver.

The *only* reliable way to be sure if any of the asynchronous engines
are ready to use is to poll the 2D engine status bit in the system
controller register (not the one in the 2d engine itself!) and if its
in local bus mode be sure your bus timings are conservative enough not
to lock the device up solid with it driving i/o lines against the CPU
(not that the magic smoke escapes or anything :-(. Hopefully this
explains why we wait explicitly for the device as we do (Oh and
revision C fixes those problems but introduces others because of its
weaker bus drive)

Next I discovered that although the 2d engine nominally has a shadowed
register set, what actually happens if you attempt an operation while
the engine is busy is: play a guessing game as to whether the device
will lock up solid, corrupt both requests or work fine.

Finally the revision C introduces an amusing issue where certain
operations followed by certain other operations cause register
corruption. To avoid this issue I specifically program all registers
to do with an operation each time. There is a compromise between the
overhead of a couple of register writes against *much* less complexity
than either the conditional execution depending of current device
configuration or working out which operations may follow which and
fixing up as appropriate (I have coded both these approaches and it was
ugly in comparison).

Thankyou for your review, I hope I explained why I have dones some of
these seemingly odd things. If you think there is another approach I
should take based on this explanation please let me know.

-- 
Vincent Sanders
Simtec Electronics

Download attachment "signature.asc" of type "application/pgp-signature" (190 bytes)

Powered by blists - more mailing lists