linux-kernel - Re: [PATCH V3 3/5] drm/vkms: Decouple crc operations from composer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3btolF-iC_bUcyzo-zi59zM7NaH48sTEQkbB-TAKZJOsm8DkUq4LhpApi6Mep0npPQ-slnIzxQfoatEXDVQi-H6Ypv-zwcctRvimXF0TZAo=@emersion.fr>
Date:   Thu, 11 Jul 2019 08:23:06 +0000
From:   Simon Ser <contact@...rsion.fr>
To:     Daniel Vetter <daniel@...ll.ch>
Cc:     Rodrigo Siqueira <rodrigosiqueiramelo@...il.com>,
        Brian Starkey <brian.starkey@....com>,
        Liviu Dudau <liviu.dudau@....com>,
        Haneen Mohammed <hamohammed.sa@...il.com>,
        "dri-devel@...ts.freedesktop.org" <dri-devel@...ts.freedesktop.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH V3 3/5] drm/vkms: Decouple crc operations from composer

On Thursday, July 11, 2019 11:19 AM, Daniel Vetter <daniel@...ll.ch> wrote:
> Aside: This all kinda doesn't go in the right direction for
> high-performance composing, so I guess I need to get started with typing
> up what that should look like.

Some related logs from IRC:

2019-07-10 19:42:49     danvet  ickle, is there an idiot guide to reasonable fast blending/composing with the cpu?
2019-07-10 19:43:13     danvet  for vkms, so maitainability is high on the wishlist, but it needs to be somewhat fast to be able to keep up
2019-07-10 19:44:02     ickle   pixman for rectangles
2019-07-10 19:45:00     ickle   if you want reasonably fast, you want simd
2019-07-10 19:45:16     ickle   so better to use some prebaked library
2019-07-10 19:45:38     ickle   but within vkms?
2019-07-10 19:45:48     ickle   or just for igt/vkms?
2019-07-10 19:46:40     ickle   within vkms for writeback blending I guess
2019-07-10 19:51:38     ickle   http://paste.debian.net/1091097/
2019-07-10 19:53:08     ickle   http://paste.debian.net/1091098/
2019-07-10 19:54:36     emersion        danvet: what are your plans for the compositor refactoring?
2019-07-10 19:55:21     emersion        are we still really using macros instead of functions
2019-07-10 19:56:22     <--     nvishwa1 (~nvishwa1@....55.54.44) has quit (Remote host closed the connection)
2019-07-10 19:56:44     emersion        is this coming from pixman, ickle?
2019-07-10 19:56:50     -->     karolherbst (~kherbst@...2:8308:b0be:6900:d9e8:6dcd:2f6a:6cb5) has joined #intel-gfx
2019-07-10 19:57:23     ickle   yes, take it as a reference on how to do abgr32 premultiplied alpha blending
2019-07-10 19:57:33     <--     amathes (ajmathes@.../intel/x-ywywuftprrqndbaj) has quit (Ping timeout: 245 seconds)
2019-07-10 19:57:48     emersion        yeah, that's a good idea
2019-07-10 19:57:49     ickle   or argb32 actually
2019-07-10 19:58:01     emersion        (asking for the source because license)
2019-07-10 19:58:04     ickle   MIT
2019-07-10 19:58:07     emersion        nice
2019-07-10 20:00:12     <--     sandeep (sandeep@.../intel/x-buwbzkgopszvwbcr) has quit (Remote host closed the connection)
2019-07-10 20:01:34     danvet  ickle, yeah vkms in the kernel
2019-07-10 20:02:41     danvet  also might need more than 8 bits ...
2019-07-10 20:02:57     danvet  and kinda hoped I could just tell gcc to simdify it for me
2019-07-10 20:03:23     danvet  emersion, compositor refactoring
2019-07-10 20:03:46     danvet  ickle, higher level I figured something like a fetch fifo in a standard format
2019-07-10 20:04:02     danvet  with some drm_format->standard format conversion tools
2019-07-10 20:04:08     danvet  and then one blender
2019-07-10 20:04:39     danvet  and then either add that to the crc and toss it (again maybe scanline-by-scanline, or whatever fits reasonable into l1$ all together)
2019-07-10 20:04:44     danvet  or dump it into writeback
2019-07-10 20:08:25     danvet  https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html <- this stuff essentially, using generics
2019-07-10 20:08:38     danvet  *generic intrinsics
2019-07-10 20:08:49     danvet  or is that going to be real awful?
2019-07-10 20:09:35     ickle   shouldn't be required for _basic_ blending
2019-07-10 20:09:52     danvet  yeah I think all we want is premultiplied alpha
2019-07-10 20:09:53     ickle   if all you need is an over operator, then gcc should be pretty good
2019-07-10 20:09:57     <--     sdutt (sdutt@.../intel/x-zhquyrigdztslyqh) has quit (Ping timeout: 268 seconds)
2019-07-10 20:09:59     danvet  maybe some yuv->rgb
2019-07-10 20:10:19     danvet  expanding from whatever silly format we have to the right vector
2019-07-10 20:10:33     emersion        what would be your universal format?
2019-07-10 20:10:46     danvet  a16r16b16g16
2019-07-10 20:10:50     danvet  except if gcc barfs on that
2019-07-10 20:11:02     danvet  or maybe go all in on 4x float :-)
2019-07-10 20:11:10     emersion        eh
2019-07-10 20:11:29     emersion        fp16 would work, but would also mean rounding errors, probably?
2019-07-10 20:11:35     danvet  uint16 is not going to be awesome for hdr, but good enough for everything else
2019-07-10 20:11:42     danvet  no cpu has fp16
2019-07-10 20:11:52     emersion        ah, fp32
2019-07-10 20:11:57     krh     wait for avx1024
2019-07-10 20:12:01     emersion        seems kind of overkill
2019-07-10 20:12:01     danvet  fg32 is probably fastest option we can get on common hw
2019-07-10 20:12:12     danvet  krh, very much aiming for good enough here
2019-07-10 20:12:26     ickle   one plan is to sneak ksim into the kernel as generic gpu-on-x86
2019-07-10 20:12:52     krh     can the kernel use avx2?
2019-07-10 20:13:05     danvet  well, all stuff I'd need to figure out
2019-07-10 20:13:07     danvet  I hope so
2019-07-10 20:13:12     ickle   it can
2019-07-10 20:13:19     ickle   easier than mmx
2019-07-10 20:13:49     emersion        when you say gcc barfs on a16r16b16g16: why?
2019-07-10 20:13:57     danvet  kernel_fpu_begin/end + telling gcc to optimize the crap out of the file with the blending functions
2019-07-10 20:14:11     emersion        uint64 too hard for gcc to optimize?
2019-07-10 20:14:14     danvet  emersion, I haven't checked, but if it generates silly code then might be better to go with fp32
2019-07-10 20:14:28     vsyrjala        the compiler always generates silly code
2019-07-10 20:14:30     danvet  ideally it should all boil down to sse/avx
2019-07-10 20:14:32     emersion        ahah
2019-07-10 20:14:51     danvet  and ideally all with generic intrinsics so the arm folks don't freak out
2019-07-10 20:15:07     -->     sandeep (~sandeep@....134.139.76) has joined #intel-gfx
2019-07-10 20:15:23     emersion        +1 for generic intrinsics
2019-07-10 20:15:42     danvet  the conversion from uint to fp32 might be hilarious
2019-07-10 20:15:50     emersion        yeah, probably
2019-07-10 20:15:52     danvet  perhaps the one place where we want to use an sse or avx intrinsic
2019-07-10 20:16:19     danvet  especially if we convert to simd16 instead of something like 4x4
2019-07-10 20:16:22     emersion        we should probably do some little experiments before doing anything
2019-07-10 20:16:24     danvet  simd4x4
2019-07-10 20:16:31     danvet  yeah
2019-07-10 20:16:41     danvet  that's why I'm asking here, since I have roughly 0 clue about this
2019-07-10 20:17:48     danvet  I don't think we ever need a dot or anything like that, so plain simd is propably best
2019-07-10 20:18:07     danvet  except the input is usually 4x or 3x vectors
2019-07-10 20:18:07     emersion        a dot?
2019-07-10 20:18:12     danvet  dot product
2019-07-10 20:18:14     emersion        ah
2019-07-10 20:18:15     danvet  for vertex shaders
2019-07-10 20:18:20     emersion        yeah, probably not
2019-07-10 20:18:21     bwidawsk        danvet› btw, I think "generic intrinsic" is an ARM thing
2019-07-10 20:18:29     bwidawsk        I think everywhere else, they just say intrinsic
2019-07-10 20:18:41     bwidawsk        at least, I've never heard the generic prefix other than ARM compiler
2019-07-10 20:19:19     vsyrjala        just make the max supported resolution 8x8 or something and speed shouldn't be a huge issue
2019-07-10 20:19:30     emersion        i think he meant generic intrinsic vs. sse/avx/whatever
2019-07-10 20:19:49     bwidawsk        emersion› yes, I figured out what he meant, I just mentioned it because it was a source of confusion
2019-07-10 20:19:52     emersion        vsyrjala, helpful as always :)
2019-07-10 20:20:32     danvet  oh gcc has all the casting implementing in the intrinsics too
2019-07-10 20:20:50     danvet  bwidawsk, gcc manpage also calls them generic intrinsics
2019-07-10 20:21:25     bwidawsk        danvet› not my gcc manpage
2019-07-10 20:21:40     danvet  well "generic vector operations"
2019-07-10 20:21:46     danvet  ^^ that what you meant?
2019-07-10 20:22:03     danvet  vs "machine-specific vector intrinsics"
2019-07-10 20:22:38     bwidawsk        I thought the generic intrinsic term came from ARM's proprietary compiler
2019-07-10 20:22:40     bwidawsk        but I might be wrong