linux-kernel - Re: [RFC PATCH 1/5] x86: introduce preemption disable prefix

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6F1FD9DA-5E86-42A2-8EAF-05F5D70FE2EF@vmware.com>
Date:   Fri, 19 Oct 2018 01:08:23 +0000
From:   Nadav Amit <namit@...are.com>
To:     Andy Lutomirski <luto@...capital.net>
CC:     Ingo Molnar <mingo@...hat.com>,
        Andrew Lutomirski <luto@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        "H. Peter Anvin" <hpa@...or.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>, X86 ML <x86@...nel.org>,
        Borislav Petkov <bp@...en8.de>,
        "Woodhouse, David" <dwmw@...zon.co.uk>
Subject: Re: [RFC PATCH 1/5] x86: introduce preemption disable prefix

at 10:00 AM, Andy Lutomirski <luto@...capital.net> wrote:

> 
> 
>> On Oct 18, 2018, at 9:47 AM, Nadav Amit <namit@...are.com> wrote:
>> 
>> at 8:51 PM, Andy Lutomirski <luto@...capital.net> wrote:
>> 
>>>> On Wed, Oct 17, 2018 at 8:12 PM Nadav Amit <namit@...are.com> wrote:
>>>> at 6:22 PM, Andy Lutomirski <luto@...capital.net> wrote:
>>>> 
>>>>>> On Oct 17, 2018, at 5:54 PM, Nadav Amit <namit@...are.com> wrote:
>>>>>> 
>>>>>> It is sometimes beneficial to prevent preemption for very few
>>>>>> instructions, or prevent preemption for some instructions that precede
>>>>>> a branch (this latter case will be introduced in the next patches).
>>>>>> 
>>>>>> To provide such functionality on x86-64, we use an empty REX-prefix
>>>>>> (opcode 0x40) as an indication that preemption is disabled for the
>>>>>> following instruction.
>>>>> 
>>>>> Nifty!
>>>>> 
>>>>> That being said, I think you have a few bugs. First, you can’t just ignore
>>>>> a rescheduling interrupt, as you introduce unbounded latency when this
>>>>> happens — you’re effectively emulating preempt_enable_no_resched(), which
>>>>> is not a drop-in replacement for preempt_enable(). To fix this, you may
>>>>> need to jump to a slow-path trampoline that calls schedule() at the end or
>>>>> consider rewinding one instruction instead. Or use TF, which is only a
>>>>> little bit terrifying…
>>>> 
>>>> Yes, I didn’t pay enough attention here. For my use-case, I think that the
>>>> easiest solution would be to make synchronize_sched() ignore preemptions
>>>> that happen while the prefix is detected. It would slightly change the
>>>> meaning of the prefix.
>> 
>> So thinking about it further, rewinding the instruction seems the easiest
>> and most robust solution. I’ll do it.
>> 
>>>>> You also aren’t accounting for the case where you get an exception that
>>>>> is, in turn, preempted.
>>>> 
>>>> Hmm.. Can you give me an example for such an exception in my use-case? I
>>>> cannot think of an exception that might be preempted (assuming #BP, #MC
>>>> cannot be preempted).
>>> 
>>> Look for cond_local_irq_enable().
>> 
>> I looked at it. Yet, I still don’t see how exceptions might happen in my
>> use-case, but having said that - this can be fixed too.
> 
> I’m not totally certain there’s a case that matters.  But it’s worth checking 

I am still checking. But, I wanted to ask you whether the existing code is
correct, since it seems to me that others do the same mistake I did, unless
I don’t understand the code.

Consider for example do_int3(), and see my inlined comments:

dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
{
	...
	ist_enter(regs); 		// => preempt_disable()
	cond_local_irq_enable(regs);	// => assume it enables IRQs

	...
	// resched irq can be delivered here. It will not caused rescheduling
	// since preemption is disabled

	cond_local_irq_disable(regs);	// => assume it disables IRQs
	ist_exit(regs);			// => preempt_enable_no_resched()
}

At this point resched will not happen for unbounded length of time (unless
there is another point when exiting the trap handler that checks if
preemption should take place).

Another example is __BPF_PROG_RUN_ARRAY(), which also uses
preempt_enable_no_resched().

Am I missing something?

Thanks,
Nadav