[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4FA2CAD9.6010808@zytor.com>
Date: Thu, 03 May 2012 11:13:45 -0700
From: "H. Peter Anvin" <hpa@...or.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
CC: Al Viro <viro@...iv.linux.org.uk>, Nick Piggin <npiggin@...il.com>,
Jana Saout <jana@...ut.de>, Joel Becker <jlbec@...lplan.org>,
linux-kernel@...r.kernel.org
Subject: Re: Oops with DCACHE_WORD_ACCESS and ocfs2, autofs4
On 05/03/2012 10:30 AM, Linus Torvalds wrote:
> On Thu, May 3, 2012 at 9:15 AM, Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
>>
>> So I guess I need to do the exception handling that I was hoping I
>> wouldn't have to. Give me a jiffy.
>
> Ok, that took longer than a jiffy, the asm was just nasty to get right
> with all the proper suffixes for 32-bit vs 64-bit, and the fact that
> gas apparently really needs %cl for the shift count, and doesn't like
> %rcx. Silly assembler.
>
Yes, although it's a fixed register you can also just write as %%cl.
> Also, the asm would have been much simpler if I didn't care so much
> about the regular fast-path. I wanted the fast-path for the asm to be
> a single load, with no downside, and everything fixed up in the
> exception case.
>
> And it's close. It's a single load, and the only downside is that
> register '%rcx' is marked as used, because *if* the exception happens,
> we want to use %rcx do the alignment fixup.
>
> Peter, in particular, can you double (and triple-) check my asm, to
> see if I missed anything? It does that "lea" of the address into %rcx
> twice, because that way we don't need any other register temporaries.
Just from a cleanliness point of view, I don't think you need the
__WORDSUFFIX for any of these instructions (it is only required if it
would be ambiguous, but the register names should deal with it.)
> - fast-path single-instruction unaligned load (with gcc free to pick
> registers and addressing modes):
>
> movl (%edi,%edx),%eax
>
> - with the exception fixup code becoming:
>
> leal (%edi,%edx),%ecx
> andl $-4,%ecx
> movl (%ecx),%eax
> leal (%edi,%edx),%ecx
> andl $3,%ecx
> shll $3,%ecx
> shll %cl,%eax
> shrl %cl,%eax
> jmp 2b
I think you want to drop the shl instruction. You're loading what
should end up at the LSB end of the register into the MSB end of the
register, so shr is all you should need.
Let's say %edi+%edx points to 0xcccccffd with the values 66 77 88 99
starting at 0xcccccffc. If the next page is present and zero, you'd end
up with %eax = 0x00998877, and so you would expect the same.
lea (%edi,%edx),%ecx -> %ecx = 0xcccccffd
and $-4,%ecx -> %ecx = 0xcccccffc
mov (%ecx),%eax -> %eax = 0x99887766
lea (%edi,%edx),%ecx -> %ecx = 0xcccccffd
and $3,%ecx -> %ecx = 1
shl $3,%ecx -> %ecx = 8
shr %cl,%eax -> %eax = 0x00998877
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists