[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c08616b8-d209-ff08-1b74-645a49a486d2@familie-tometzki.de>
Date: Thu, 30 Jan 2020 06:47:14 +0100
From: Damian Tometzki <damian.tometzki@...ilie-tometzki.de>
To: Borislav Petkov <bp@...e.de>,
Linus Torvalds <torvalds@...ux-foundation.org>
Cc: "Luck, Tony" <tony.luck@...el.com>, Ingo Molnar <mingo@...nel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Borislav Petkov <bp@...en8.de>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [GIT PULL] x86/asm changes for v5.6
Am 29.01.20 um 19:34 schrieb Borislav Petkov:
> On Wed, Jan 29, 2020 at 09:40:58AM -0800, Linus Torvalds wrote:
>> On Wed, Jan 29, 2020 at 9:07 AM Luck, Tony <tony.luck@...el.com> wrote:
>>>
>>> This returns "3" ... not what we want. But swapping the ERMS/FSRM order
>>> gets the correct version.
>>
>> That actually makes sense, and is what I suspected (after I wrote the
>> patch) what would happen. It would just be good to have it explicitly
>> documented at the macro.
>
> Like this?
>
> ---
> diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
> index 13adca37c99a..d94bad03bcb4 100644
> --- a/arch/x86/include/asm/alternative.h
> +++ b/arch/x86/include/asm/alternative.h
> @@ -164,6 +164,11 @@ static inline int alternatives_text_reserved(void *start, void *end)
> ALTINSTR_REPLACEMENT(newinstr, feature, 1) \
> ".popsection\n"
>
> +/*
> + * The patching happens in the natural order of first newinstr1 and then
> + * newinstr2, iff the respective feature bits are set. See apply_alternatives()
> + * for details.
> + */
> #define ALTERNATIVE_2(oldinstr, newinstr1, feature1, newinstr2, feature2)\
> OLDINSTR_2(oldinstr, 1, 2) \
> ".pushsection .altinstructions,\"a\"\n" \
>
>> That would be bad indeed, but I don't think it should matter
>> particularly for this case - it would have been bad before too.
>>
>> I suspect there is some other issue going on, like the NOP padding
>> logic being confused.
>
> Or the cmp $0x20 test missing in the default case, see below.
>
>> In particular, look here, this is the alt_instruction entries:
>>
>> altinstruction_entry 140b,143f,\feature1,142b-140b,144f-143f,142b-141b
>> altinstruction_entry 140b,144f,\feature2,142b-140b,145f-144f,142b-141b
>>
>> where the arguments are "orig alt feature orig_len alt_len pad_len" in
>> that order.
>>
>> Notice how "pad_len" in both cases is the padding from the _original_
>> instruction (142b-141b).
>
> <snip this which I'll take a look later so that we can sort out the
> issue at hand first>
>
>> So I'm just hand-waving. Maybe there was some simpler explanation
>> (like me just picking the wrong instructions when I did the rough
>> conversion and simply breaking things with some stupid bug).
>
> Looks like it. So I did this:
>
> ---
> diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
> index 7ff00ea64e4f..a670d01570df 100644
> --- a/arch/x86/lib/memmove_64.S
> +++ b/arch/x86/lib/memmove_64.S
> @@ -39,23 +39,19 @@ SYM_FUNC_START(__memmove)
> cmp %rdi, %r8
> jg 2f
>
> - /* FSRM implies ERMS => no length checks, do the copy directly */
> -.Lmemmove_begin_forward:
> - ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "", X86_FEATURE_FSRM
> - ALTERNATIVE "", "movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS
> -
> /*
> - * movsq instruction have many startup latency
> - * so we handle small size by general register.
> - */
> - cmp $680, %rdx
> - jb 3f
> - /*
> - * movsq instruction is only good for aligned case.
> + * Three rep-string alternatives:
> + * - go to "movsq" for large regions where source and dest are
> + * mutually aligned (same in low 8 bits). "label 4"
> + * - plain rep-movsb for FSRM
> + * - rep-movs for > 32 byte for ERMS.
> */
> +.Lmemmove_begin_forward:
> + ALTERNATIVE_2 \
> + "cmp $0x20, %rdx; jb 1f; cmp $680, %rdx ; jb 3f ; cmpb %dil, %sil; je 4f", \
> + "cmp $0x20, %rdx; jb 1f; movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS, \
> + "movq %rdx, %rcx ; rep movsb; retq", X86_FEATURE_FSRM
>
> - cmpb %dil, %sil
> - je 4f
> 3:
> sub $0x20, %rdx
> /*
>
> ---
>
> Notice how the *first* option of the alternative, which is the default
> one, has that gotten that additional "cmp $0x20, %rdx; jb 1f" test which
> sends us down to the less than 32 bytes copy length.
>
> Your original version didn't have it and here's what I saw:
>
> So I stopped the guest just before that code and the call trace looked
> like this:
>
> #0 memmove () at arch/x86/lib/memmove_64.S:43
> #1 0xffffffff824448c2 in memblock_insert_region (type=0xffffffff824a8298 <memblock+56>, idx=<optimized out>, base=0,
> size=4096, nid=2, flags=MEMBLOCK_NONE) at mm/memblock.c:553
> #2 0xffffffff824454f0 in memblock_add_range (type=0xffffffff824a8298 <memblock+56>, base=0, size=<optimized out>,
> nid=73400320, flags=<optimized out>) at mm/memblock.c:641
> #3 0xffffffff82445627 in memblock_reserve (base=0, size=4096) at mm/memblock.c:830
> #4 0xffffffff823ff399 in setup_arch (cmdline_p=0xffffffff82003f28) at arch/x86/kernel/setup.c:798
> #5 0xffffffff823f9ae1 in start_kernel () at init/main.c:598
> #6 0xffffffff810000d4 in secondary_startup_64 () at arch/x86/kernel/head_64.S:242
> #7 0x0000000000000000 in ?? ()
>
> and count in rdx was:
>
> rdx 0x18 24
>
> Without that "cmp $0x20" test above, we do the "cmp $680, %rdx; jb 3f;" test
> and we run into the following asm at label 3:
>
> 3:
> sub $0x20, %rdx
> /*
> * We gobble 32 bytes forward in each loop.
> */
>
> <--- right here %rdx is:
>
> rdx 0xfffffffffffffff8 -8
>
> and yeeehaaw, we're in the weeds and then end up triplefaulting at some
> unmapped source address in %rsi or so.
>
> So now I'm going to play all three variants with pen and paper to make
> sure we're still sane.
>
> Thx.
>
Hello together,
in my qemu env the system isnt coming up. I tried both with and without
the changes from Borislav.
Best regards
Damian
0.605193] ------------[ cut here ]------------
[ 0.605933] General protection fault in user access. Non-canonical
address?
[ 0.605948] WARNING: CPU: 0 PID: 1 at arch/x86/mm/extable.c:77
ex_handler_uaccess+0x48/0x50
[ 0.606931] Modules linked in:
[ 0.606931] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0 #15
[ 0.606931] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
[ 0.606931] RIP: 0010:ex_handler_uaccess+0x48/0x50
[ 0.606931] Code: 83 c4 08 b8 01 00 00 00 5b c3 80 3d 78 75 50 01 00
75 dc 48 c7 c7 00 ea 1e 82 48 89 34 24 c6 05 64 75 50 01 01 e8 9e fd 00
00 <0f> 0b 48 8b 34 24 eb bd 80 3d 4f 75 50 01 00 55 48 89 fd 53 49
[ 0.606931] RSP: 0000:ffffc900000dbc30 EFLAGS: 00010286
[ 0.606931] RAX: 000000000000003f RBX: ffffffff82339d6c RCX:
0000000000000000
[ 0.606931] RDX: 0000000000000007 RSI: ffffffff8316dc5f RDI:
ffffffff8316e05f
[ 0.606931] RBP: 000000000000000d R08: ffffffff8316dc20 R09:
0000000000029840
[ 0.606931] R10: 000000010196fab4 R11: 0000000000000001 R12:
0000000000000000
[ 0.606931] R13: 0000000000000000 R14: 0000000000000000 R15:
0000000000000000
[ 0.606931] FS: 0000000000000000(0000) GS:ffff88800f800000(0000)
knlGS:0000000000000000
[ 0.606931] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.606931] CR2: 0000000000000000 CR3: 000000000240a000 CR4:
00000000000006f0
[ 0.606931] Call Trace:
[ 0.606931] fixup_exception+0x3e/0x51
[ 0.606931] do_general_protection+0x9c/0x260
[ 0.606931] general_protection+0x28/0x30
[ 0.606931] RIP: 0010:copy_user_generic_string+0x2c/0x40
[ 0.606931] Code: 00 83 fa 08 72 27 89 f9 83 e1 07 74 15 83 e9 08 f7
d9 29 ca 8a 06 88 07 48 ff c6 48 ff c7 ff c9 75 f2 89 d1 c1 e9 03 83 e2
07 <f3> 48 a5 89 d1 f3 a4 31 c0 0f 1f 00 c3 0f 1f 80 00 00 00 00 0f
[ 0.606931] RSP: 0000:ffffc900000dbda0 EFLAGS: 00010246
[ 0.606931] RAX: 0000000000000000 RBX: ffff88801e488000 RCX:
0000000000000001
[ 0.606931] RDX: 0000000000000000 RSI: 009896808086a201 RDI:
ffffc900000dbdc0
[ 0.606931] RBP: ffffffffffffffff R08: ffff88800f82d280 R09:
0000000000000073
[ 0.606931] R10: 0000000000000000 R11: ffffc900000dbd68 R12:
0000000000000dc0
[ 0.606931] R13: 00000000000000fe R14: ffffffff81252d01 R15:
009896808086a201
[ 0.606931] ? __register_sysctl_table+0x361/0x500
[ 0.606931] __probe_kernel_read+0x2e/0x60
[ 0.606931] __kmalloc+0x10b/0x230
[ 0.606931] __register_sysctl_table+0x361/0x500
[ 0.606931] ? kmem_cache_alloc_trace+0xee/0x1e0
[ 0.606931] __register_sysctl_paths+0x186/0x1b0
[ 0.606931] ? iomap_init+0x1b/0x1b
[ 0.606931] ? do_early_param+0x89/0x89
[ 0.606931] dquot_init+0x23/0x117
[ 0.606931] ? iomap_init+0x1b/0x1b
[ 0.606931] do_one_initcall+0x31/0x1b0
[ 0.606931] ? do_early_param+0x89/0x89
[ 0.606931] ? do_early_param+0x89/0x89
[ 0.606931] kernel_init_freeable+0x15b/0x1b3
[ 0.606931] ? rest_init+0x9a/0x9a
[ 0.606931] kernel_init+0x5/0xf6
[ 0.606931] ret_from_fork+0x35/0x40
[ 0.606931] ---[ end trace 8ee2a58282a5eb54 ]---
[ 0.650539] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
Powered by blists - more mailing lists