linux-kernel - Re: [PATCH v2 05/15] x86/alternatives: Use optimized NOPs for padding

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150304064303.GA16387@gmail.com>
Date:	Wed, 4 Mar 2015 07:43:03 +0100
From:	Ingo Molnar <mingo@...nel.org>
To:	Borislav Petkov <bp@...en8.de>
Cc:	X86 ML <x86@...nel.org>, Andy Lutomirski <luto@...capital.net>,
	LKML <linux-kernel@...r.kernel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH v2 05/15] x86/alternatives: Use optimized NOPs for padding


* Borislav Petkov <bp@...en8.de> wrote:

> From: Borislav Petkov <bp@...e.de>
> 
> Alternatives allow now for an empty old instruction. In this case we go
> and pad the space with NOPs at assembly time. However, there are the
> optimal, longer NOPs which should be used. Do that at patching time by
> adding alt_instr.padlen-sized NOPs at the old instruction address.
> 
> Cc: Andy Lutomirski <luto@...capital.net>
> Signed-off-by: Borislav Petkov <bp@...e.de>
> ---
>  arch/x86/kernel/alternative.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index 715af37bf008..af397cc98d05 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -323,6 +323,14 @@ done:
>  		n_dspl, (unsigned long)orig_insn + n_dspl + repl_len);
>  }
>  
> +static void __init_or_module optimize_nops(struct alt_instr *a, u8 *instr)
> +{
> +	add_nops(instr + (a->instrlen - a->padlen), a->padlen);

So while looking at this patch I was wondering about the following 
question: right now add_nops() does the obvious 'fill with large NOPs 
first, then fill the remaining bytes with a smaller NOP' logic:

/* Use this to add nops to a buffer, then text_poke the whole buffer. */
static void __init_or_module add_nops(void *insns, unsigned int len)
{
        while (len > 0) {
                unsigned int noplen = len;
                if (noplen > ASM_NOP_MAX)
                        noplen = ASM_NOP_MAX;
                memcpy(insns, ideal_nops[noplen], noplen);
                insns += noplen;
                len -= noplen;
        }
}

this works perfectly fine, but I'm wondering how current decoders work 
when a large NOP crosses a cache line boundary or a page boundary. Is 
there any inefficiency in that case, and if yes, could we avoid that 
by not spilling NOPs across cachelines or page boundaries?

With potentially thousands of patched instructions both situations are 
bound to occur dozens of times in the cacheline case, and a few times 
in the page boundary case.

There's also the following special case, of a large NOP followed by a 
small NOP, when the number of NOPs would not change if we padded 
differently:

                [      large NOP         ][smaller NOP]
       [         cacheline 1        ][        cacheline 2             ]

which might be more optimally filled with two mid-size NOPs:

                [    midsize NOP    ][   midsize NOP  ]
       [         cacheline 1        ][        cacheline 2             ]

So that any special boundary is not partially covered by a NOP 
instruction.

But the main question is, do such alignment details ever matter to 
decoder performance?

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/