[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFxeBHSB6iBpP55kV2bLQGg8ZC9Ve5mvCmZN0ARH9dVwKg@mail.gmail.com>
Date: Thu, 3 May 2012 10:30:38 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Al Viro <viro@...iv.linux.org.uk>, "H. Peter Anvin" <hpa@...or.com>
Cc: Nick Piggin <npiggin@...il.com>, Jana Saout <jana@...ut.de>,
Joel Becker <jlbec@...lplan.org>, linux-kernel@...r.kernel.org
Subject: Re: Oops with DCACHE_WORD_ACCESS and ocfs2, autofs4
On Thu, May 3, 2012 at 9:15 AM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> So I guess I need to do the exception handling that I was hoping I
> wouldn't have to. Give me a jiffy.
Ok, that took longer than a jiffy, the asm was just nasty to get right
with all the proper suffixes for 32-bit vs 64-bit, and the fact that
gas apparently really needs %cl for the shift count, and doesn't like
%rcx. Silly assembler.
Also, the asm would have been much simpler if I didn't care so much
about the regular fast-path. I wanted the fast-path for the asm to be
a single load, with no downside, and everything fixed up in the
exception case.
And it's close. It's a single load, and the only downside is that
register '%rcx' is marked as used, because *if* the exception happens,
we want to use %rcx do the alignment fixup.
Peter, in particular, can you double (and triple-) check my asm, to
see if I missed anything? It does that "lea" of the address into %rcx
twice, because that way we don't need any other register temporaries.
On 32-bit, this results in:
- fast-path single-instruction unaligned load (with gcc free to pick
registers and addressing modes):
movl (%edi,%edx),%eax
- with the exception fixup code becoming:
leal (%edi,%edx),%ecx
andl $-4,%ecx
movl (%ecx),%eax
leal (%edi,%edx),%ecx
andl $3,%ecx
shll $3,%ecx
shll %cl,%eax
shrl %cl,%eax
jmp 2b
which looks ok. I don't worry about the efficiency of the fixup code,
because if that code is ever entered we will have taken a page fault
etc, so the only thing to worry about is that the fixup doesn't
need/fix any unnecessary extra registers so that the fast-path case
doesn't get less flexible.
Does anybody see anything wrong with this?
Anyway, with this, I guess we could enable word-at-a-time even with
CONFIG_DEBUG_PAGEALLOC on x86, and that might even be a good idea for
coverage.
Jana - does the attached patch work for you?
Linus
Download attachment "patch.diff" of type "application/octet-stream" (3573 bytes)
Powered by blists - more mailing lists