lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 05 Jul 2023 22:16:54 -0700
From:   John Fastabend <john.fastabend@...il.com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>,
        Tero Kristo <tero.kristo@...ux.intel.com>
Cc:     Shuah Khan <shuah@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>, X86 ML <x86@...nel.org>,
        Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Ingo Molnar <mingo@...hat.com>,
        Alexei Starovoitov <ast@...nel.org>,
        "open list:KERNEL SELFTEST FRAMEWORK" 
        <linux-kselftest@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Andrii Nakryiko <andrii@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        bpf <bpf@...r.kernel.org>
Subject: Re: [PATCH 1/2] x86/tsc: Add new BPF helper call bpf_rdtsc

Alexei Starovoitov wrote:
> On Mon, Jul 3, 2023 at 3:58 AM Tero Kristo <tero.kristo@...ux.intel.com> wrote:
> >
> > Currently the raw TSC counter can be read within kernel via rdtsc_ordered()
> > and friends, and additionally even userspace has access to it via the
> > RDTSC assembly instruction. BPF programs on the other hand don't have
> > direct access to the TSC counter, but alternatively must go through the
> > performance subsystem (bpf_perf_event_read), which only provides relative
> > value compared to the start point of the program, and is also much slower
> > than the direct read. Add a new BPF helper definition for bpf_rdtsc() which
> > can be used for any accurate profiling needs.
> >
> > A use-case for the new API is for example wakeup latency tracing via
> > eBPF on Intel architecture, where it is extremely beneficial to be able
> > to get raw TSC timestamps and compare these directly to the value
> > programmed to the MSR_IA32_TSC_DEADLINE register. This way a direct
> > latency value from the hardware interrupt to the execution of the
> > interrupt handler can be calculated. Having the functionality within
> > eBPF also has added benefits of allowing to filter any other relevant
> > data like C-state residency values, and also to drop any irrelevant
> > data points directly in the kernel context, without passing all the
> > data to userspace for post-processing.
> >
> > Signed-off-by: Tero Kristo <tero.kristo@...ux.intel.com>
> > ---
> >  arch/x86/include/asm/msr.h |  1 +
> >  arch/x86/kernel/tsc.c      | 23 +++++++++++++++++++++++
> >  2 files changed, 24 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
> > index 65ec1965cd28..3dde673cb563 100644
> > --- a/arch/x86/include/asm/msr.h
> > +++ b/arch/x86/include/asm/msr.h
> > @@ -309,6 +309,7 @@ struct msr *msrs_alloc(void);
> >  void msrs_free(struct msr *msrs);
> >  int msr_set_bit(u32 msr, u8 bit);
> >  int msr_clear_bit(u32 msr, u8 bit);
> > +u64 bpf_rdtsc(void);
> >
> >  #ifdef CONFIG_SMP
> >  int rdmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 *l, u32 *h);
> > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > index 344698852146..ded857abef81 100644
> > --- a/arch/x86/kernel/tsc.c
> > +++ b/arch/x86/kernel/tsc.c
> > @@ -15,6 +15,8 @@
> >  #include <linux/timex.h>
> >  #include <linux/static_key.h>
> >  #include <linux/static_call.h>
> > +#include <linux/btf.h>
> > +#include <linux/btf_ids.h>
> >
> >  #include <asm/hpet.h>
> >  #include <asm/timer.h>
> > @@ -29,6 +31,7 @@
> >  #include <asm/intel-family.h>
> >  #include <asm/i8259.h>
> >  #include <asm/uv/uv.h>
> > +#include <asm/tlbflush.h>
> >
> >  unsigned int __read_mostly cpu_khz;    /* TSC clocks / usec, not used here */
> >  EXPORT_SYMBOL(cpu_khz);
> > @@ -1551,6 +1554,24 @@ void __init tsc_early_init(void)
> >         tsc_enable_sched_clock();
> >  }
> >
> > +u64 bpf_rdtsc(void)
> > +{
> > +       /* Check if Time Stamp is enabled only in ring 0 */
> > +       if (cr4_read_shadow() & X86_CR4_TSD)
> > +               return 0;
> 
> Why check this? It's always enabled in the kernel, no?
> 
> > +
> > +       return rdtsc_ordered();
> 
> Why _ordered? Why not just rdtsc ?
> Especially since you want to trace latency. Extra lfence will ruin
> the measurements.
> 

If we used it as a fast way to order events on multiple CPUs I
guess we need the lfence? We use ktime_get_ns() now for things
like this when we just need an order counter. We have also
observed time going backwards with this and have heuristics
to correct it but its rare.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ