[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <2E6DBDE0-FEEA-467F-A380-4ED736B6C912@amacapital.net>
Date: Mon, 25 May 2020 10:19:08 -0700
From: Andy Lutomirski <luto@...capital.net>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Rasmus Villemoes <linux@...musvillemoes.dk>,
Andy Lutomirski <luto@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>, X86 ML <x86@...nel.org>
Subject: Re: [RFC][PATCH 0/4] x86/entry: disallow #DB more
> On May 25, 2020, at 4:01 AM, Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Mon, May 25, 2020 at 12:40:38PM +0200, Peter Zijlstra wrote:
>>> On Mon, May 25, 2020 at 12:02:48PM +0200, Rasmus Villemoes wrote:
>>>
>>> Naive question: did you check disassembly to see whether gcc threw your
>>> native_get_debugreg() away, given that the asm isn't volatile and the
>>> result is not used for anything? Testing here only shows a "mov
>>> %r9,%db7", but the read did seem to get thrown away.
>>
>> Argh.. no I did not. Writing it all in asm gets me:
>>
>> [ 1.627405] XXX: 3900 8304 22632
>>
>> which is a lot worse...
>
> + u64 empty = 0, read = 0, write = 0, cpu = 0, cpu1 = 0;
> + unsigned long dr7;
> +
> + for (i=0; i<100; i++) {
> + u64 s;
> +
> + s = rdtsc();
> + asm volatile ("lfence; lfence;");
> + empty += rdtsc() - s;
> +
> + s = rdtsc();
> + asm volatile ("lfence; mov %%db7, %0; lfence;" : "=r" (dr7));
> + read += rdtsc() - s;
> +
> + s = rdtsc();
> + asm volatile ("lfence; mov %0, %%db7; lfence;" :: "r" (dr7));
> + write += rdtsc() - s;
> +
> + s = rdtsc();
> + asm volatile ("lfence; mov %0, %%db7; lfence;" :: "r" (dr7));
> + write += rdtsc() - s;
> +
> + clflush(this_cpu_ptr(&cpu_dr7));
> +
> + s = rdtsc();
> + asm volatile ("lfence;");
> + dr7 = this_cpu_read(cpu_dr7);
> + asm volatile ("lfence;");
> + cpu += rdtsc() - s;
> +
> + s = rdtsc();
> + asm volatile ("lfence;");
> + dr7 = this_cpu_read(cpu_dr7);
> + asm volatile ("lfence;");
> + cpu1 += rdtsc() - s;
> + }
> +
> + printk("XXX: %ld %ld %ld %ld %ld\n", empty, read, write, cpu, cpu1);
>
> [ 1.628252] XXX: 3820 8224 45516 35560 4800
>
> Which still seems to suggest using DR7 directly is probably a good
> thing. It's slower than a L1 hit, but massively faster than a full miss.
>
How about adding it to cpu_tlbstate? A lot of NMIs are going to read that anyway to check CR3.
And blaming KVM is a bit misplaced. This isn’t KVM’s fault — it’s Intel’s. VT-x has two modes: DR access exits and DR access doesn’t exit. There’s no shadow mode.
Powered by blists - more mailing lists