[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=whESMW2v0cd0Ye+AnV0Hp9j+Mm4BO2xJo93eQcC1xghUA@mail.gmail.com>
Date: Wed, 13 Dec 2023 22:53:19 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: "linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
Linux Trace Kernel <linux-trace-kernel@...r.kernel.org>,
Masami Hiramatsu <mhiramat@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Subject: Re: [PATCH] ring-buffer: Remove 32bit timestamp logic
On Wed, 13 Dec 2023 at 18:45, Steven Rostedt <rostedt@...dmis.org> wrote:
>
> tl;dr; The ring-buffer timestamp requires a 64-bit cmpxchg to keep the
> timestamps in sync (only in the slow paths). I was told that 64-bit cmpxchg
> can be extremely slow on 32-bit architectures. So I created a rb_time_t
> that on 64-bit was a normal local64_t type, and on 32-bit it's represented
> by 3 32-bit words and a counter for synchronization. But this now requires
> three 32-bit cmpxchgs for where one simple 64-bit cmpxchg would do.
It's not that a 64-bit cmpxchg is even slow. It doesn't EXIST AT ALL
on older 32-bit x86 machines.
Which is why we have
arch/x86/lib/cmpxchg8b_emu.S
which emulates it on machines that don't have the CX8 capability
("CX8" being the x86 capability flag name for the cmpxchg8b
instruction, aka 64-bit cmpxchg).
Which only works because those older 32-bit cpu's also don't do SMP,
so there are no SMP cache coherency issues, only interrupt atomicity
issues.
IOW, the way to do an atomic 64-bit cmpxchg on the affected hardware
is to simply disable interrupts.
In other words - it's not just slow. It's *really* slow. As in 10x
slower, not "slightly slower".
> We started discussing how much time this is actually saving to be worth the
> complexity, and actually found some hardware to test. One Atom processor.
That atom processor won't actually show the issue. It's much too
recent. So your "test" is actually worthless.
And you probably did this all with a kernel config that had
CONFIG_X86_CMPXCHG64 set anyway, which wouldn't even boot on a i486
machine.
So in fact your test was probably doubly broken, in that not only
didn't you test the slow case, you tested something that wouldn't even
have worked in the environment where the slow case happened.
Now, the real question is if anybody cares about CPUs that don't have
cmpxchg8b support.
IBecause in practice, it's really just old 486-class machines (and a
couple of clone manufacturers who _claimed_ to be Pentium class, but
weren't - there was also some odd thing with Windows breaking if you
had CPUID claiming to support CX8
We dropped support for the original 80386 some time ago. I'd actually
be willing to drop support for ll pre-cmpxchg8b machines, and get rid
of the emulation.
I also suspect that from a perf angle, none of this matters. The
emulation being slow probably is a non-issue, simply because even if
you run on an old i486 machine, you probably won't be doing perf or
tracing on it.
Linus
Powered by blists - more mailing lists