lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Thu, 29 Feb 2024 14:41:37 +0200
From: Pekka Paalanen <pekka.paalanen@...labora.com>
To: Arthur Grillo <arthurgrillo@...eup.net>
Cc: Rodrigo Siqueira <rodrigosiqueiramelo@...il.com>, Melissa Wen
 <melissa.srw@...il.com>, Maíra Canal
 <mairacanal@...eup.net>, Haneen Mohammed <hamohammed.sa@...il.com>, Daniel
 Vetter <daniel@...ll.ch>, Maarten Lankhorst
 <maarten.lankhorst@...ux.intel.com>, Maxime Ripard <mripard@...nel.org>,
 Thomas Zimmermann <tzimmermann@...e.de>, David Airlie <airlied@...il.com>,
 Jonathan Corbet <corbet@....net>, dri-devel@...ts.freedesktop.org,
 linux-kernel@...r.kernel.org, jeremie.dautheribes@...tlin.com,
 miquel.raynal@...tlin.com, thomas.petazzoni@...tlin.com,
 seanpaul@...gle.com, marcheu@...gle.com, nicolejadeyee@...gle.com
Subject: Re: [PATCH v3 3/9] drm/vkms: write/update the documentation for
 pixel conversion and pixel write functions

On Tue, 27 Feb 2024 15:47:08 -0300
Arthur Grillo <arthurgrillo@...eup.net> wrote:

> On 27/02/24 12:02, Louis Chauvet wrote:
> > Le 26/02/24 - 10:07, Arthur Grillo a écrit :  
> >>
> >>
> >> On 26/02/24 05:46, Louis Chauvet wrote:  
> >>> Add some documentation on pixel conversion functions.
> >>> Update of outdated comments for pixel_write functions.
> >>>
> >>> Signed-off-by: Louis Chauvet <louis.chauvet@...tlin.com>
> >>> ---
> >>>  drivers/gpu/drm/vkms/vkms_composer.c |  4 +++
> >>>  drivers/gpu/drm/vkms/vkms_drv.h      | 13 ++++++++
> >>>  drivers/gpu/drm/vkms/vkms_formats.c  | 58 ++++++++++++++++++++++++++++++------
> >>>  3 files changed, 66 insertions(+), 9 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> >>> index c6d9b4a65809..5b341222d239 100644
> >>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> >>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> >>> @@ -189,6 +189,10 @@ static void blend(struct vkms_writeback_job *wb,
> >>>  
> >>>  	size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;
> >>>  
> >>> +	/*
> >>> +	 * The planes are composed line-by-line. It is a necessary complexity to avoid poor
> >>> +	 * blending performance.  
> >>
> >> At this moment in the series, you have not yet reintroduced the
> >> line-by-line algorithm yet. Maybe it's better to add this comment when
> >> you do.  
> > 
> > Is it better with this:
> > 
> > 	/*
> > 	 * The planes are composed line-by-line to avoid heavy memory usage. It is a necessary
> > 	 * complexity to avoid poor blending performance.
> > 	 *
> > 	 * The function vkms_compose_row is used to read a line, pixel-by-pixel, into the staging
> > 	 * buffer.
> > 	 */
> >    
> >> Also, I think it's good to give more context, like:
> >> "The planes are composed line-by-line, instead of pixel-by-pixel"  
> > 
> > And after PATCHv3 5/9:
> > 
> > 	/*
> > 	 * The planes are composed line-by-line to avoid heavy memory usage. It is a necessary
> > 	 * complexity to avoid poor blending performance.
> > 	 *
> > 	 * The function pixel_read_line callback is used to read a line, using an efficient 
> > 	 * algorithm for a specific format, into the staging buffer.
> > 	 */
> >   

Hi,

there are a few reasons for the line-by-line algorithm, and the
optimizations at large:

VKMS uses temporary stage and output buffers so that blending functions
can operate on just one high-precision pixel format, struct
pixel_argb_u16. We can make pixel-format-specific read and write
functions completely orthogonal from the blending operations and FB
format combinations. This avoids a combinatorial explosion of needed
functions for { input pixel formats × blending operations × output pixel
formats }.

We can use a temporary stage and output buffer whose size is one line
and not whole FB or CRTC framebuffer. This is the memory savings.

Using a temporary output buffer also avoids repeated
read-decode-blend-encode-write cycles into the final destination
buffer, as we don't need to decode/encode the pixel format.

Finally, doing elementary operations (read, blend, write) line-by-line
is much more efficient than pixel-by-pixel, because it allows making
the inner-most loop very tight. It avoids repeatedly computing a result
that does not change, like which function to call for a specific pixel
format or blending equation.


Thanks,
pq

Content of type "application/pgp-signature" skipped

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ