lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGXJAmz=DweQnvpWhgACnCUcxhq1-Yp9m5KznSL1RNCX-p_-EQ@mail.gmail.com>
Date: Fri, 29 Aug 2025 10:08:07 -0700
From: John Ousterhout <ouster@...stanford.edu>
To: Paolo Abeni <pabeni@...hat.com>
Cc: "netdev@...r.kernel.org" <netdev@...r.kernel.org>, Eric Dumazet <edumazet@...gle.com>, 
	Simon Horman <horms@...nel.org>, Jakub Kicinski <kuba@...nel.org>
Subject: Re: [PATCH net-next v15 03/15] net: homa: create shared Homa header files

On Fri, Aug 29, 2025 at 12:53 AM Paolo Abeni <pabeni@...hat.com> wrote:
>
> On 8/29/25 5:03 AM, John Ousterhout wrote:
> > On Wed, Aug 27, 2025 at 12:21 AM Paolo Abeni <pabeni@...hat.com> wrote:
> >
> >> The TSC raw value depends on the current CPU.
> >
> > This is incorrect. There were problems in the first multi-core Intel
> > chips in the early 2000s, but they were fixed before I began using TSC
> > in 2010. The TSC counter is synchronized across cores and increments
> > at a constant rate independent of core frequency and power state.
>
> Please read:
>
> https://elixir.bootlin.com/linux/v6.17-rc3/source/arch/x86/include/asm/tsc.h#L14

This does not contradict my assertion, but maybe we are talking about
different things.

First, the statement "the results can be non-monotonic if compared on
different CPUs" in the link you sent doesn't really make sense as
written. There is no way to execute RDTSC instructions at exactly the
same moment on two CPUs and compare the results. Maybe the comment is
referring to a situation like this:
* Execute RDTSC on core A.
* Increment a shared variable on core A.
* Read the variable's value on core B.
* Execute RDTSC on core B.

In this situation, it is possible that the time returned by RDTSC on
core B could precede that observed on core A, while the value of the
variable read by core B reflects the increment. Perhaps this is what
you meant by your statement "The TSC raw value depends on the current
CPU"? I interpreted your words to mean that each CPU has its own
independent TSC counter, which was the case in the early 2000's but is
not the case today.

There are two different issues here:
* Is the TSC clock itself consistent across CPUs? Yes it is. It does
not depend on which CPU reads it.
* When are TSC values read relative to the execution of nearby
instructions? This is also well-defined: with RDTSC, the time is read
as soon as the instruction is decoded. Of course, this may be before
some previous instructions have been retired, so the time could appear
to have been read out-of-order. This means you shouldn't use RDTSC
values to deduce the order of operations on different cores.

> > I have measured Homa performance using ktime_get_ns, and
> > this adds about .04 core to Homa's total core utilization when driving
> > a 25 Gbps link at 80% utilization bidirectional.
>
> What is that 0.04? A percent? of total CPU time? of CPU time used by
> Homa? absolute time?

It's .04 core. In other words Homa uses 40ms more execution time every
second with ktime_get_ns than it did with get_cycles, when running
this particular workload.

> If that is percent of total CPU time for a single core, such value is
> inconsistent with my benchmarking where a couple of timestamp() reads
> per aggregate packet are well below noise level.

Homa is doing a lot more than a couple of timestamp() reads per
aggregate packet. The version of Homa that I measured (my default
version, even for benchmarking) is heavily instrumented; you will see
the instrumentation in a later patch series. So far, I've been able to
afford the instrumentation without significant performance penalty,
and I'd like to keep it that way if possible.

-John-

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ