linux-kernel - Re: [PATCH 2/2] Add a thread cpu time implementation to vDSO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4EE6850F.5070002@fb.com>
Date:	Mon, 12 Dec 2011 14:49:51 -0800
From:	Arun Sharma <asharma@...com>
To:	Andrew Lutomirski <luto@....edu>
CC:	<linux-kernel@...r.kernel.org>, Kumar Sundararajan <kumar@...com>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...e.hu>,
	Thomas Gleixner <tglx@...utronix.de>,
	john stultz <johnstul@...ibm.com>
Subject: Re: [PATCH 2/2] Add a thread cpu time implementation to vDSO

On 12/12/11 12:15 PM, Andrew Lutomirski wrote:

> This seems like a neat idea, but I'm a little worried about the
> implementation.  Surely someone with 4k+ cpus will complain when the
> percpu vvar data exceeds a page and crashes.  I have no personal
> objection to a dynamically-sized vvar area, but it might need more
> careful handling.

I'm for disabling the vsyscall when the data doesn't fit in a couple of 
pages.

>
> I also wonder if there a problem with information leaking.  If there
> was an entire per-cpu vvar page (perhaps mapped differently on each
> cpu) then nothing interesting would leak.  But that could add
> overhead.
>

The timing based attacks depend on the granularity of timestamps. I feel 
what's available here is too coarse grained to be useful. Happy to
disable the code at compile time for those cases. Are there 
CONFIG_HIGH_SECURITY type of options I could use for this purpose?


>> diff --git a/arch/x86/include/asm/vvar.h b/arch/x86/include/asm/vvar.h
>> index 0fd7a4a..e36e1c1 100644
>> --- a/arch/x86/include/asm/vvar.h
>> +++ b/arch/x86/include/asm/vvar.h
>> @@ -47,5 +47,6 @@
>>   DECLARE_VVAR(0, volatile unsigned long, jiffies)
>>   DECLARE_VVAR(16, int, vgetcpu_mode)
>>   DECLARE_VVAR(128, struct vsyscall_gtod_data, vsyscall_gtod_data)
>> +DECLARE_VVAR(2048, struct vcpu_data, vcpu_data)
>
> Any reason this isn't page-aligned?  Offset 2048 seems like an odd place.
>

I meant it to be 128 + sizeof(struct vsyscall_gtod_data) so we fit 
everything in one page if CONFIG_NR_CPUS is small enough.

I'll fix it up in the next rev.

>>   notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
>>   {
>>         long ret;
>> -       asm("syscall" : "=a" (ret) :
>> -           "0" (__NR_clock_gettime),"D" (clock), "S" (ts) : "memory");
>> +       asm volatile("syscall" : "=a" (ret) :
>> +                    "0" (__NR_clock_gettime),"D" (clock), "S" (ts) : "memory");
>>         return ret;
>>   }
>
> Huh?  This should probably be a separate patch (and probably not a
> -stable candidate, since it would take amazing compiler stupidity to
> generate anything other than the obvious code).  The memory clobber
> may also be enough to make this officially safe.

Yes - this should be a separate patch. gcc-4.4 likes to get rid of the 
instruction in __do_thread_cpu_time without the asm volatile (in spite 
of the memory clobber).


>> +       if (vp->tsc_unstable) {
>> +               struct timespec ts;
>> +               vdso_fallback_gettime(CLOCK_THREAD_CPUTIME_ID,&ts);
>> +               return timespec_to_ns(&ts);
>> +       }
>
> Yuck -- another flag indicating whether we're using the tsc.

I renamed it to thread_cputime_disabled to deal with NR_CPUS > 64.

>
>> +
>> +       do {
>> +               native_read_tscp(&p);
>
> Do all 64-bit cpus have rdtscp?  ISTR older Intel machines don't have it.
>

Yeah - I ran into this one when testing with KVM on a rdtscp capable 
CPU. Will disable the vsyscall when the CPU doesn't support rdtscp.

>> --- a/arch/x86/vdso/vdso.lds.S
>> +++ b/arch/x86/vdso/vdso.lds.S
>> @@ -25,6 +25,8 @@ VERSION {
>>                 __vdso_getcpu;
>>                 time;
>>                 __vdso_time;
>> +               thread_cpu_time;
>> +               __vdso_thread_cpu_time;
>>         local: *;
>>         };
>>   }
>
> Why do we have the non-__vdso versions?  IMO anything that actually
> uses them is likely to be confused.  They have the same names as the
> glibc wrappers, but the glibc wrappers have different return value and
> errno semantics.  I'd say just use __vdso.

I'm not familiar with the history of __vdso_foo vs foo. Happy to get rid 
of the non __vdso versions.

>
> Also, what's wrong with just adding the functionality to __vdso_clock_gettime?
>

Performance. Most of this is client dependent (whether it expects time 
in nanoseconds or timespec).

Baseline: 		19.34 secs
vclock_gettime: 	4.74 secs
thread_cpu_time:	3.62 secs

Our use case prefers nanoseconds, so the conversion to timespec and back 
adds overhead. Also, having a direct entry point vs going through the 
switch in clock_gettime() adds some cycles.

I have some people here asking me if they could get CLOCK_REALTIME and 
CLOCK_MONOTONIC also in nanoseconds for the same reason.


>> +struct vcpu_data {
>> +       struct vpercpu_data vpercpu[NR_CPUS];
>> +       unsigned int tsc_khz;
>
> I think this also duplicates some of the gtod_data stuff, at least in
> the case that TSC works for gtod.
>

vgotd.h seems to be clock source independent. This vsyscall is usable 
only on systems with tsc_unstable == 0. Also, there is only one instance 
of mult and shift there, whereas we save it on a per-cpu basis in the 
VVAR page.

Thanks for the review and comments!

  -Arun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/