[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220328155957.GK614@gate.crashing.org>
Date: Mon, 28 Mar 2022 10:59:57 -0500
From: Segher Boessenkool <segher@...nel.crashing.org>
To: Mark Rutland <mark.rutland@....com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Nick Desaulniers <ndesaulniers@...gle.com>,
Borislav Petkov <bp@...en8.de>,
Nathan Chancellor <nathan@...nel.org>, x86-ml <x86@...nel.org>,
lkml <linux-kernel@...r.kernel.org>, llvm@...ts.linux.dev,
Josh Poimboeuf <jpoimboe@...hat.com>,
linux-toolchains@...r.kernel.org
Subject: Re: clang memcpy calls
On Mon, Mar 28, 2022 at 03:58:10PM +0100, Mark Rutland wrote:
> On Mon, Mar 28, 2022 at 09:22:20AM -0500, Segher Boessenkool wrote:
> > The attribute is about how the *current* function is instrumented, not
> > about anything called by this function. This is clearly documented:
> > 'no_sanitize_address'
> > 'no_address_safety_analysis'
> > The 'no_sanitize_address' attribute on functions is used to inform
> > the compiler that it should not instrument memory accesses in the
> > function when compiling with the '-fsanitize=address' option. The
> > 'no_address_safety_analysis' is a deprecated alias of the
> > 'no_sanitize_address' attribute, new code should use
> > 'no_sanitize_address'.
>
> I understand this, and I have read the documentation.
>
> I'm not claiming any *individual* semantic is wrong, just that in
> combination this doesn't provide what people *need* (even if it strictly
> matches what is documented).
>
> My argument is: if the compiler is permitted to implictly and
> arbitrarily add calls to instrumented functions within a function marked
> with `no_sanitize_address`, the `no_sanitize_address` attribute is
> effectively useless, and therefore *something* needs to change.
I do not see how that follows. Maybe that is obvious from how you look
at your use case, but it is not from the viewpoint of people who just
want to do sanitation. So what is the goal here? Why do you need to
prevent sanitation on anything called from this function, at all cost?
> > > I appreciate where you're coming from here, but I think you're approaching the
> > > problem sideways.
> >
> > I am stating facts, I am not trying to solve your problem there. It
> > seemed to me (and still does) that you didn't grasp all facts here.
>
> Sorry, but I think you're reading my replies uncharitably if you think
> that.
Not at all. I just don't see what your problem is, and what you try to
achieve. I do know what you say you want, but that is clearly
impossible to do: the compiler cannot put restrictions on what some
external function will or won't do!
> > > We need to define *what the semantics are* so that we can actually solve the
> > > problem, e.g. is a memcpy implementation expected to be instrumented or not?
> >
> > That is up to the memcpy implementation itself, of course.
>
> Sorry, but that doesn't make sense to me. When the compiler instruments
> a function with AddressSanitizer, it must have *some* assumption about
> whether memcpy() itself will be instrumented, such that it won't miss
> some necessary instrumentation (and ideally, for performance reasons
> doesn't have redundant instrumentation).
Yes. It sets things up with an external memcpy that is sanitized. But
that happens at the linking stage: it is fine in general for user space,
but for kasan you need to do something similar manually.
> If the story is "memcpy may or may not be instrumented", then the only
> way to guarantee necessary instrumentation is for the compiler to
> *always* place it in the caller (unless forbidden by
> `no_sanitize_address`). If that were the case, the kernel can make
> things work by simply not instrumenting memcpy and friends.
It is *impossible* (in general) to put this in the caller, and it is not
how this stuff is designed either (of course).
> IIUC today those assumptions are not documented. Is the behaviour
> consistent?
The documentation I quoted above is simple and clear enough I hope.
Consistent? Consistent with what?
> > > > GCC *requires* memcpy to be the standard memcpy always (i.e. to have the
> > > > standard-specified semantics). This means that it will have the same
> > > > semantics as __builtin_memcpy always, and either or not be a call to an
> > > > external function. It can also create calls to it out of thin air.
> > >
> > > I understand all of that.
> >
> > And still you want us to do something that is impossible under those
> > existing constraints :-(
>
> If that's truly impossible, that's very unfortunate.
>
> FWIW, I can believe this would require tremendous effort to change, even
> if it's not truly impossible.
No. Truly and trivially impossible. You must want something else than
what you say, but I cannot figure out what.
To implement what you ask for we will have to build every function
twice, once with and once without instrumentation, and emit both
somewhere, somehow. This requires either some ABI extensions, or file
format extensions ("fat objects"), or at the minimum some copperation
between the app (the kernel here), the compiler, and the linker (and
more tools in general, but, kernel :-) )
> If that is the case, it means that kernel side we have to never
> instrument our implementation of memcpy and friends for correctness
> reasons, which has the unfortunate property of losing coverage in the
> cases we *would* like to use an instrumented memcpy.
What correctness reasons are that?
> > If you want the external memcpy called by modules A, B, C to not be
> > instrumented, you have to link A, B, and C against an uninstrumented
> > memcpy. This is something the kernel will have to do, the compiler has
> > no say in how the kernel is linked together.
>
> Unfortunately that options doesn't really fix the `no_sanitize_address`
> semantic, and forces us to move *all* uninstrumentable code out into
> separate compilation units, etc.
Yes. Which follows directly from you not wanting to call anything
instrumented from anything marked with such an attribute: you have to
divide the world into two parts, if you want the world to be divided
into two parts.
> This isn't *impossible*, but is *very* painful.
Yes. It is doable if there is just a handful of things that you do not
want instrumented. Like low-level interrupt handlers. Luckily those
things are generally written in assembler language for other reasons.
Segher
Powered by blists - more mailing lists