[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANpmjNPNa2f=kAF6c199oYVJ0iSyirQRGxeOBLxa9PmakSXRbA@mail.gmail.com>
Date: Mon, 8 Jun 2020 09:48:52 +0200
From: Marco Elver <elver@...gle.com>
To: Dmitry Vyukov <dvyukov@...gle.com>
Cc: Andrey Konovalov <andreyknvl@...gle.com>,
Peter Zijlstra <peterz@...radead.org>,
Mark Rutland <mark.rutland@....com>,
Borislav Petkov <bp@...en8.de>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>,
clang-built-linux <clang-built-linux@...glegroups.com>,
"Paul E. McKenney" <paulmck@...nel.org>,
Alexander Potapenko <glider@...gle.com>,
kasan-dev <kasan-dev@...glegroups.com>,
LKML <linux-kernel@...r.kernel.org>,
"the arch/x86 maintainers" <x86@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH -tip v3 1/2] kcov: Make runtime functions noinstr-compatible
On Sun, 7 Jun 2020 at 11:37, Dmitry Vyukov <dvyukov@...gle.com> wrote:
>
> On Fri, Jun 5, 2020 at 3:25 PM 'Andrey Konovalov' via kasan-dev
> <kasan-dev@...glegroups.com> wrote:
> > > On Fri, Jun 05, 2020 at 12:57:15PM +0200, Dmitry Vyukov wrote:
> > > > On Fri, Jun 5, 2020 at 10:28 AM Marco Elver <elver@...gle.com> wrote:
> > > > >
> > > > > While we lack a compiler attribute to add to noinstr that would disable
> > > > > KCOV, make the KCOV runtime functions return if the caller is in a
> > > > > noinstr section, and mark them noinstr.
> > > > >
> > > > > Declare write_comp_data() as __always_inline to ensure it is inlined,
> > > > > which also reduces stack usage and removes one extra call from the
> > > > > fast-path.
> > > > >
> > > > > In future, our compilers may provide an attribute to implement
> > > > > __no_sanitize_coverage, which can then be added to noinstr, and the
> > > > > checks added in this patch can be guarded by an #ifdef checking if the
> > > > > compiler has such an attribute or not.
> > > >
> > > > Adding noinstr attribute to instrumentation callbacks looks fine to me.
> > > >
> > > > But I don't understand the within_noinstr_section part.
> > > > As the cover letter mentions, kcov callbacks don't do much and we
> > > > already have it inserted and called. What is the benefit of bailing
> > > > out a bit earlier rather than letting it run to completion?
> > > > Is the only reason for potential faults on access to the vmalloc-ed
> > > > region?
> > >
> > > Vmalloc faults (on x86, the only arch that had them IIRC) are gone, per
> > > this merge window.
> > >
> > > The reason I mentioned them is because it is important that they are
> > > gone, and that this hard relies on them being gone, and the patch didn't
> > > call that out.
> > >
> > > There is one additional issue though; you can set hardware breakpoint on
> > > vmalloc space, and that would trigger #DB and then we'd be dead when we
> > > were already in #DB (IST recursion FTW).
> > >
> > > And that is not something you can trivially fix, because you can set the
> > > breakpoint before the allocation (or perhaps on a previous allocation).
> > >
> > > That said; we already have this problem with task_struct (and
> > > task_stack). IIRC Andy wants to fix the task_stack issue by making all
> > > of noinstr run on the entry stack, but we're not there yet.
> > >
> > > There are no good proposals for random allocations like task_struct or
> > > in your case kcov_area.
> > >
> > > > Andrey, Mark, do you know if it's possible to pre-fault these areas?
> > >
> > > Under the assumption that vmalloc faults are still a thing:
> > >
> > > You cannot pre-fault the remote area thing, kernel threads use the mm of
> > > the previous user task, and there is no guarantee that mm will have had
> > > the vmalloc fault.
> >
> > To clarify this part AFAIU it, even if we try to prefault the whole
> > remote area each time kcov_remote_start() is called, then (let alone
> > the performance impact) the kernel thread can be rescheduled between
> > kcov_remote_start() and kcov_remote_stop(), and then it might be
> > running with a different mm than the one that was used when
> > kcov_remote_start() happened.
>
> Ugh, this is nasty. But this has also gone, which I am happy about :)
>
> Why I am looking at this is because with coverage instrumentation
> __sanitizer_cov_trace_pc is the hottest function in the kernel and we
> are adding additional branches to it.
>
> Can we touch at least some per-cpu data within noinstr code?
> If yes, we could try to affect the existing
> in_task()/in_serving_softirq() check.
> If not, it would be useful to have a comment clarifying that
> within_noinstr_section check must happen before we touch anything
> else.
I don't think this will get us anywhere. If anything this will require
introducing code outside KCOV, and I think that makes the whole
situation even worse. My guess is also we can't even read per-CPU
data, but Peter would have to comment on this.
> I assume objtool can now also detect all violations. How bad is it now
> without within_noinstr_section check? I am assuming we marking noinstr
> functions to not be instrumented, but we are getting some stray
> instrumentation from inlined functions or something, right? How many
> are there? Is it fixable/unfixable? Marco, do you know the list, or
> could you please collect the list of violations?
It's everywhere. We cannot mark noinstr functions to not be
instrumented by KCOV/-fsanitize-coverage, because no compiler provides
an attribute to do so. GCC doesn't have
__attribute__((no_sanitize_coverage)) and Clang doesn't have
__attribute__((no_sanitize("coverage")), and therefore we can't have
__no_sanitize_coverage.
My plan would be to now go and implement the attributes, at the very
least in Clang. Then what we can do is make wihin_noinstr_section a
noop (just return false) if we have CONFIG_CC_HAS_NOSANITIZE_COVERAGE
or something.
Unfortunately, without this patch, we won't have a reliable kernel
with KCOV until we get compiler support.
The thing is that this slowdown is temporary if we add the attributes
to the compiler.
> Is there any config that disables #DB? We could well disable it on
> syzbot, I think we already disable some production hardening/debugging
> confings, which are not too useful for testing setup.
> E.g. we support RANDOMIZE_BASE, no problem, but if one disables it
> (which we do), that becomes no-op:
>
> #ifdef CONFIG_RANDOMIZE_BASE
> ip -= kaslr_offset();
> #endif
> return ip;
Powered by blists - more mailing lists