linux-kernel - RE: framebuffer corruption due to overlapping stp instructions on arm64

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Wed, 8 Aug 2018 10:21:46 -0400 (EDT)
From:   Mikulas Patocka <mpatocka@...hat.com>
To:     David Laight <David.Laight@...LAB.COM>
cc:     "'Ard Biesheuvel'" <ard.biesheuvel@...aro.org>,
        Ramana Radhakrishnan <ramana.gcc@...glemail.com>,
        Florian Weimer <fweimer@...hat.com>,
        Thomas Petazzoni <thomas.petazzoni@...e-electrons.com>,
        GNU C Library <libc-alpha@...rceware.org>,
        Andrew Pinski <pinskia@...il.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will.deacon@....com>,
        Russell King <linux@...linux.org.uk>,
        LKML <linux-kernel@...r.kernel.org>,
        linux-arm-kernel <linux-arm-kernel@...ts.infradead.org>
Subject: RE: framebuffer corruption due to overlapping stp instructions on
 arm64



On Tue, 7 Aug 2018, David Laight wrote:

> From: Mikulas Patocka
> > Sent: 07 August 2018 15:07
> ...
> > Unaccelerated scrolling is still painfully slow
> > even on modern computers because of slow framebuffer read.
> 
> I solved that many years ago on a strongarm system by mapping
> the screen memory at two separate virtual addresses.
> One uncached used for writes, the second cached using the
> 'minicache' for reads.
> (and immediately fell foul of a memcpy() function that compared
> the two virtual addresses and decided to copy backwards)
> 
> I suspect some modern cpus don't like you doing that and the
> graphics 'drivers' won't use different mappings.

Intel says that you can't mix PAT memory attributes - but the non-temporal 
store instructions use write-combining semantics on a memory that is 
normally cacheable - and it is allowed to mix non-temporal stores with 
other cacheable memory accesses - so I believe that the CPU will snoop the 
cache for wc accesses and handle the conflict.

> Even in glibc you want a more general copy_to/from_io_memory()
> rather than just 'copy_from_framebuffer()'.
> Best to define both - even if they end up identical.
> Other drivers allow PCIe space be mmap()ed into user space.
> 
> While your tests show vmovntdqa being slightly slower than an
> avx read for uncached mappings it is still much better than
> all the other options.

Tihs was a measuring glitch - movntdqa is as fast as movdqa on non-cached
mappings.

Mikulas

> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>