lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211007091003.GA337010@fuller.cnet>
Date:   Thu, 7 Oct 2021 06:10:03 -0300
From:   Marcelo Tosatti <mtosatti@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Song Liu <song@...nel.org>, bpf <bpf@...r.kernel.org>,
        open list <linux-kernel@...r.kernel.org>,
        Nitesh Narayan Lal <nitesh@...hat.com>,
        Nicolas Saenz Julienne <nsaenzju@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Xu <peterx@...hat.com>,
        Andrii Nakryiko <andrii@...nel.org>
Subject: Re: [PATCH bpf-next] bpf: introduce helper bpf_raw_read_cpu_clock

Hi Peter, Song,

On Thu, Oct 07, 2021 at 09:18:56AM +0200, Peter Zijlstra wrote:
> On Wed, Oct 06, 2021 at 02:37:09PM -0700, Song Liu wrote:
> > On Wed, Oct 6, 2021 at 10:52 AM Marcelo Tosatti <mtosatti@...hat.com> wrote:
> > >
> > >
> > >
> > > Add bpf_raw_read_cpu_clock helper, to read architecture specific
> > > CPU clock. In x86's case, this is the TSC.
> > >
> > > This is necessary to synchronize bpf traces from host and guest bpf-programs
> > > (after subtracting guest tsc-offset from guest timestamps).
> > 
> > Trying to understand the use case. So in a host-guest scenario,
> > bpf_ktime_get_ns()
> > will return different values in host and guest, but rdtsc() will give
> > the same value.
> > Is this correct?
> 
> No, it will not. 

No, but we can find out the delta between host and guest TSCs.

On x86, you can read the offset through debugfs file:

        debugfs_create_file("tsc-offset", 0444, debugfs_dentry, vcpu,
                            &vcpu_tsc_offset_fops);

Other architectures can expose that offset.

> Also, please explain if any of this stands a chance of
> working for anything other than x86. 

Yes, the same pattern repeats

ARM:

With offset between guest and host:
https://developer.arm.com/documentation/ddi0595/2020-12/AArch64-Registers/CNTVCT-EL0--Counter-timer-Virtual-Count-register?lang=en

Without offset:
commit 051ff581ce70e822729e9474941f3c206cbf7436

PPC:
https://yhbt.net/lore/all/5f267a8aec5b8199a580c96ab2b1a3c27de4eb09.camel@gmail.com/T/

(Time Base Register is read through mftb instruction).

> Or even on x86 in the face of
> guest migration.

It won't, but honestly we don't care about tracing at this level across
migration.

> Also, please explain, again, what's wrong with dumping snapshots of
> CLOCK_MONOTONIC{,_RAW} from host and guest and correlating time that
> way?

You can't read the guest and the host clock at the same time (there will always
be some variable delay between reading the two clocks). And that delay
is not fixed, but variable (depending on scheduling of the guest vcpus, 
for example). So you will need an algorithm to estimate their differences, 
with non zero error bounds:

"
 Add a driver with gettime method returning hosts realtime clock.
 This allows Chrony to synchronize host and guest clocks with 
 high precision (see results below).
 
 chronyc> sources
 MS Name/IP address         Stratum Poll Reach LastRx Last sample
 ===============================================================================
 #* PHC0                          0   3   377     6     +4ns[   +4ns] +/-    3ns
"

Now with the hardware clock (which is usually the base for CLOCK_MONOTONIC_RAW),
there are no errors (offset will be 0 ns, rather than 3/4ns).

> And also explain why BPF needs to do this differently than all the other
> tracers.

For x86 we use:

echo "x86-tsc" > /sys/kernel/debug/tracing/trace_clock

For this purpose, on x86, so its not like anything different is being
done?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ