linux-kernel - Re: [PATCH] x86: Split off mem*io functions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+55aFx37CgDoZeJzxQpsk14yZAzt0g69VtJx=Upj=FBUpCPQw@mail.gmail.com>
Date:	Tue, 13 Dec 2011 14:58:37 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Brian Gerst <brgerst@...il.com>
Cc:	x86@...nel.org, linux-kernel@...r.kernel.org, tim@...ngt.org,
	hpa@...or.com
Subject: Re: [PATCH] x86: Split off mem*io functions

On Sun, Dec 11, 2011 at 2:10 PM, Brian Gerst <brgerst@...il.com> wrote:
> Commit 6175ddf06b6172046a329e3abfd9c901a43efd2e changed the mem*io
> functions to use the standard memcpy/memset routines, but there were
> unintended consequences.  Some devices cannot cope with 64-bit or
> non-sequential accesses that the optimized routines do.  Change them
> back to simple 32-bit sequential writes.

It might be worth giving examples of when the optimized routines don't work:

 - some devices (for example, traditional CGA/VGA) read and write to
different banks and bit modes, so a "read-mask-write" operation just
does not work at all. You have to do pure writes when copying to such
a destination. Afaik, none of our *current* memory copies do this for
partial words, but it's an example of something that is valid (and
sometimes done) in memcpy to avoid unaligned stores, for example.

 - Many devices do not like overlapping stores, which the optimized
memory copies *do* do. Our memory copy does things like this:

        /*
         * Move data from 4 bytes to 7 bytes.
         */
        movl (%rsi), %ecx
        movl -4(%rsi, %rdx), %r8d
        movl %ecx, (%rdi)
        movl %r8d, -4(%rdi, %rdx)

   to copy 4-7 bytes from source %rsi to destination %dsi (with %rdx
containing the size), which actually writes four bytes twice - some of
th ebytes are just going to be overlapping. This often does not work
at all for memory mapped IO.

 - the "enhanced string" support actually makes "rep movsb" the most
optimal way to copy memory, but only to cacheable RAM. If the source
or destination is memory mapped IO, the microcode will make "rep
movsb" turn into the traditional slow byte-by-byte copy. That will be
*extremely* slow, even if it might work for the device.

 - Finally, 64-bit writes may confuse some devices.

But the point being that it's a much bigger issue than just 64-bit.

                   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/