[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=whe12v6dDjhYcOcfHEorwZ90+cbzD1sKLRzNyB8CWsaPw@mail.gmail.com>
Date: Mon, 6 Jan 2025 19:05:22 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: LKML <linux-kernel@...r.kernel.org>, Andrew Morton <akpm@...ux-foundation.org>,
Petr Mladek <pmladek@...e.com>, Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
Rasmus Villemoes <linux@...musvillemoes.dk>, Sergey Senozhatsky <senozhatsky@...omium.org>,
Kees Cook <keescook@...omium.org>
Subject: Re: [RFC][PATCH] printf: Harden accessing pointer dereference in vsprintf()
On Mon, 6 Jan 2025 at 18:28, Steven Rostedt <rostedt@...dmis.org> wrote:
>
> vsprintf() is used in critical paths, but I wonder how much these two
> versions actually make a difference in performance?
We've had people writing patches to avoid vsnprintf because it's *so*
critical for /proc etc.
And we had actual bugs as a result, eg things like
f9ed1f7c2e26 ("genirq/proc: Use seq_put_decimal_ull_width() for
decimal values")
then causing
9d9f204bdf72 ("genirq/proc: Add missing space separator back")
so bad performance results in actual BUGGY CODE.
It's one of the reasons I was looking at vsnprintf even before the
whole tracing discussion: some of the vsnprintf overhead was actually
disgusting. And almost all of it was just stupid code, allegedly for
simplicity. Causing things like pointless stack frame setup, bad
calling conventions etc.
So my vsnprintf branch was literally written with me looking at the
resulting assembler too, because when core code works badly, the end
result is badness.
In random non-core code, I want people to write it clearly and
maintainably. I very much do not want random drivers doing
micro-optimizations etc.
End result: I don't want people to say "oh, vsnprintf does badly at
handling '%s' so I'm going to use some special string-printing thing",
the way we had that seq_put_decimal_ull_width() chaos.
And yes, I saw that very error_string() function inlined several times
when I was looking at code generation. gcc and clang did things
differently, I forget which way it went, but there was a reason I sent
out the test patch that I had *checked* made code generation almost
identical. The "fetch a byte" was still fetch a simple byte fetch.
Iirc the main difference was that yes, we had to do that annoying
"pagefault_disable/enable()" and that generated something like five
instructions just to get 'current' and increment a variable.. But it
didn't cause extra stack frames or function calls. It's been about a
month, I forget the exact details.
(And yes, I even considered turning pagefault_disable/enable there
into a preempt_count() thing instead. We do better at those percpu
variables than at per-thread ones. We used to do it that way, but we
made pagefault_disable its own counter back ten years ago exactly
*because* people wanted to have a separate counter, and sadly that
does mean that it's now a few instructions).
Linus
Powered by blists - more mailing lists