[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+Z+FFHFGSgEJGkd+zCBgUOck_odOf9_=5YQLNJQVMGNdw@mail.gmail.com>
Date: Mon, 8 Jun 2020 09:57:39 +0200
From: Dmitry Vyukov <dvyukov@...gle.com>
To: Marco Elver <elver@...gle.com>
Cc: Andrey Konovalov <andreyknvl@...gle.com>,
Peter Zijlstra <peterz@...radead.org>,
Mark Rutland <mark.rutland@....com>,
Borislav Petkov <bp@...en8.de>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>,
clang-built-linux <clang-built-linux@...glegroups.com>,
"Paul E. McKenney" <paulmck@...nel.org>,
Alexander Potapenko <glider@...gle.com>,
kasan-dev <kasan-dev@...glegroups.com>,
LKML <linux-kernel@...r.kernel.org>,
"the arch/x86 maintainers" <x86@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH -tip v3 1/2] kcov: Make runtime functions noinstr-compatible
On Mon, Jun 8, 2020 at 9:49 AM Marco Elver <elver@...gle.com> wrote:
> > On Fri, Jun 5, 2020 at 3:25 PM 'Andrey Konovalov' via kasan-dev
> > <kasan-dev@...glegroups.com> wrote:
> > > > On Fri, Jun 05, 2020 at 12:57:15PM +0200, Dmitry Vyukov wrote:
> > > > > On Fri, Jun 5, 2020 at 10:28 AM Marco Elver <elver@...gle.com> wrote:
> > > > > >
> > > > > > While we lack a compiler attribute to add to noinstr that would disable
> > > > > > KCOV, make the KCOV runtime functions return if the caller is in a
> > > > > > noinstr section, and mark them noinstr.
> > > > > >
> > > > > > Declare write_comp_data() as __always_inline to ensure it is inlined,
> > > > > > which also reduces stack usage and removes one extra call from the
> > > > > > fast-path.
> > > > > >
> > > > > > In future, our compilers may provide an attribute to implement
> > > > > > __no_sanitize_coverage, which can then be added to noinstr, and the
> > > > > > checks added in this patch can be guarded by an #ifdef checking if the
> > > > > > compiler has such an attribute or not.
> > > > >
> > > > > Adding noinstr attribute to instrumentation callbacks looks fine to me.
> > > > >
> > > > > But I don't understand the within_noinstr_section part.
> > > > > As the cover letter mentions, kcov callbacks don't do much and we
> > > > > already have it inserted and called. What is the benefit of bailing
> > > > > out a bit earlier rather than letting it run to completion?
> > > > > Is the only reason for potential faults on access to the vmalloc-ed
> > > > > region?
> > > >
> > > > Vmalloc faults (on x86, the only arch that had them IIRC) are gone, per
> > > > this merge window.
> > > >
> > > > The reason I mentioned them is because it is important that they are
> > > > gone, and that this hard relies on them being gone, and the patch didn't
> > > > call that out.
> > > >
> > > > There is one additional issue though; you can set hardware breakpoint on
> > > > vmalloc space, and that would trigger #DB and then we'd be dead when we
> > > > were already in #DB (IST recursion FTW).
> > > >
> > > > And that is not something you can trivially fix, because you can set the
> > > > breakpoint before the allocation (or perhaps on a previous allocation).
> > > >
> > > > That said; we already have this problem with task_struct (and
> > > > task_stack). IIRC Andy wants to fix the task_stack issue by making all
> > > > of noinstr run on the entry stack, but we're not there yet.
> > > >
> > > > There are no good proposals for random allocations like task_struct or
> > > > in your case kcov_area.
> > > >
> > > > > Andrey, Mark, do you know if it's possible to pre-fault these areas?
> > > >
> > > > Under the assumption that vmalloc faults are still a thing:
> > > >
> > > > You cannot pre-fault the remote area thing, kernel threads use the mm of
> > > > the previous user task, and there is no guarantee that mm will have had
> > > > the vmalloc fault.
> > >
> > > To clarify this part AFAIU it, even if we try to prefault the whole
> > > remote area each time kcov_remote_start() is called, then (let alone
> > > the performance impact) the kernel thread can be rescheduled between
> > > kcov_remote_start() and kcov_remote_stop(), and then it might be
> > > running with a different mm than the one that was used when
> > > kcov_remote_start() happened.
> >
> > Ugh, this is nasty. But this has also gone, which I am happy about :)
> >
> > Why I am looking at this is because with coverage instrumentation
> > __sanitizer_cov_trace_pc is the hottest function in the kernel and we
> > are adding additional branches to it.
> >
> > Can we touch at least some per-cpu data within noinstr code?
> > If yes, we could try to affect the existing
> > in_task()/in_serving_softirq() check.
> > If not, it would be useful to have a comment clarifying that
> > within_noinstr_section check must happen before we touch anything
> > else.
>
> I don't think this will get us anywhere. If anything this will require
> introducing code outside KCOV, and I think that makes the whole
> situation even worse. My guess is also we can't even read per-CPU
> data, but Peter would have to comment on this.
>
> > I assume objtool can now also detect all violations. How bad is it now
> > without within_noinstr_section check? I am assuming we marking noinstr
> > functions to not be instrumented, but we are getting some stray
> > instrumentation from inlined functions or something, right? How many
> > are there? Is it fixable/unfixable? Marco, do you know the list, or
> > could you please collect the list of violations?
>
> It's everywhere. We cannot mark noinstr functions to not be
> instrumented by KCOV/-fsanitize-coverage, because no compiler provides
> an attribute to do so. GCC doesn't have
> __attribute__((no_sanitize_coverage)) and Clang doesn't have
> __attribute__((no_sanitize("coverage")), and therefore we can't have
> __no_sanitize_coverage.
>
> My plan would be to now go and implement the attributes, at the very
> least in Clang. Then what we can do is make wihin_noinstr_section a
> noop (just return false) if we have CONFIG_CC_HAS_NOSANITIZE_COVERAGE
> or something.
>
> Unfortunately, without this patch, we won't have a reliable kernel
> with KCOV until we get compiler support.
>
> The thing is that this slowdown is temporary if we add the attributes
> to the compiler.
As a crazy idea: is it possible to employ objtool (linker script?) to
rewrite all coverage calls to nops in the noinstr section? Or relocate
to nop function?
What we are trying to do is very static, it _should_ have been done
during build. We don't have means in existing _compilers_ to do this,
but maybe we could do it elsewhere during build?...
> > Is there any config that disables #DB? We could well disable it on
> > syzbot, I think we already disable some production hardening/debugging
> > confings, which are not too useful for testing setup.
> > E.g. we support RANDOMIZE_BASE, no problem, but if one disables it
> > (which we do), that becomes no-op:
> >
> > #ifdef CONFIG_RANDOMIZE_BASE
> > ip -= kaslr_offset();
> > #endif
> > return ip;
Powered by blists - more mailing lists