[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201118074609.20fdf9c4@gandalf.local.home>
Date: Wed, 18 Nov 2020 07:46:09 -0500
From: Steven Rostedt <rostedt@...dmis.org>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: LKML <linux-kernel@...r.kernel.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Matt Mullins <mmullins@...x.us>, paulmck <paulmck@...nel.org>,
Ingo Molnar <mingo@...hat.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Dmitry Vyukov <dvyukov@...gle.com>,
Martin KaFai Lau <kafai@...com>,
Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
Andrii Nakryiko <andriin@...com>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...omium.org>,
netdev <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
Kees Cook <keescook@...omium.org>,
Peter Zijlstra <peterz@...radead.org>,
Josh Poimboeuf <jpoimboe@...hat.com>
Subject: Re: [PATCH v2] tracepoint: Do not fail unregistering a probe due to
memory allocation
On Tue, 17 Nov 2020 20:54:24 -0800
Alexei Starovoitov <alexei.starovoitov@...il.com> wrote:
> > extern int
> > @@ -310,7 +312,12 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
> > do { \
> > it_func = (it_func_ptr)->func; \
> > __data = (it_func_ptr)->data; \
> > - ((void(*)(void *, proto))(it_func))(__data, args); \
> > + /* \
> > + * Removed functions that couldn't be allocated \
> > + * are replaced with TRACEPOINT_STUB. \
> > + */ \
> > + if (likely(it_func != TRACEPOINT_STUB)) \
> > + ((void(*)(void *, proto))(it_func))(__data, args); \
>
> I think you're overreacting to the problem.
I will disagree with that.
> Adding run-time check to extremely unlikely problem seems wasteful.
Show me a real benchmark that you can notice a problem here. I bet that
check is even within the noise of calling an indirect function. Especially
on a machine with retpolines.
> 99.9% of the time allocate_probes() will do kmalloc from slab of small
> objects.
> If that slab is out of memory it means it cannot allocate a single page.
> In such case so many things will be failing to alloc that system
> is unlikely operational. oom should have triggered long ago.
> Imo Matt's approach to add __GFP_NOFAIL to allocate_probes()
Looking at the GFP_NOFAIL comment:
* %__GFP_NOFAIL: The VM implementation _must_ retry infinitely: the caller
* cannot handle allocation failures. The allocation could block
* indefinitely but will never return with failure. Testing for
* failure is pointless.
* New users should be evaluated carefully (and the flag should be
* used only when there is no reasonable failure policy) but it is
* definitely preferable to use the flag rather than opencode endless
* loop around allocator.
I realized I made a mistake in my patch for using it, as my patch is a
failure policy. It looks like something we want to avoid in general.
Thanks, I'll send a v3 that removes it.
> when it's called from func_remove() is much better.
> The error was reported by syzbot that was using
> memory fault injections. ENOMEM in allocate_probes() was
> never seen in real life and highly unlikely will ever be seen.
And the biggest thing you are missing here, is that if you are running on a
machine that has static calls, this code is never hit unless you have more
than one callback on a single tracepoint. That's because when there's only
one callback, it gets called directly, and this loop is not involved.
-- Steve
Powered by blists - more mailing lists