lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 13 Dec 2011 14:58:37 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Brian Gerst <brgerst@...il.com>
Cc:	x86@...nel.org, linux-kernel@...r.kernel.org, tim@...ngt.org,
	hpa@...or.com
Subject: Re: [PATCH] x86: Split off mem*io functions

On Sun, Dec 11, 2011 at 2:10 PM, Brian Gerst <brgerst@...il.com> wrote:
> Commit 6175ddf06b6172046a329e3abfd9c901a43efd2e changed the mem*io
> functions to use the standard memcpy/memset routines, but there were
> unintended consequences.  Some devices cannot cope with 64-bit or
> non-sequential accesses that the optimized routines do.  Change them
> back to simple 32-bit sequential writes.

It might be worth giving examples of when the optimized routines don't work:

 - some devices (for example, traditional CGA/VGA) read and write to
different banks and bit modes, so a "read-mask-write" operation just
does not work at all. You have to do pure writes when copying to such
a destination. Afaik, none of our *current* memory copies do this for
partial words, but it's an example of something that is valid (and
sometimes done) in memcpy to avoid unaligned stores, for example.

 - Many devices do not like overlapping stores, which the optimized
memory copies *do* do. Our memory copy does things like this:

        /*
         * Move data from 4 bytes to 7 bytes.
         */
        movl (%rsi), %ecx
        movl -4(%rsi, %rdx), %r8d
        movl %ecx, (%rdi)
        movl %r8d, -4(%rdi, %rdx)

   to copy 4-7 bytes from source %rsi to destination %dsi (with %rdx
containing the size), which actually writes four bytes twice - some of
th ebytes are just going to be overlapping. This often does not work
at all for memory mapped IO.

 - the "enhanced string" support actually makes "rep movsb" the most
optimal way to copy memory, but only to cacheable RAM. If the source
or destination is memory mapped IO, the microcode will make "rep
movsb" turn into the traditional slow byte-by-byte copy. That will be
*extremely* slow, even if it might work for the device.

 - Finally, 64-bit writes may confuse some devices.

But the point being that it's a much bigger issue than just 64-bit.

                   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ