[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201102124606.72bd89c5@gandalf.local.home>
Date: Mon, 2 Nov 2020 12:46:06 -0500
From: Steven Rostedt <rostedt@...dmis.org>
To: Petr Mladek <pmladek@...e.com>
Cc: linux-kernel@...r.kernel.org,
Masami Hiramatsu <mhiramat@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...nel.org>,
Josh Poimboeuf <jpoimboe@...hat.com>,
Jiri Kosina <jikos@...nel.org>,
Miroslav Benes <mbenes@...e.cz>,
Jonathan Corbet <corbet@....net>, Guo Ren <guoren@...nel.org>,
"James E.J. Bottomley" <James.Bottomley@...senpartnership.com>,
Helge Deller <deller@....de>,
Michael Ellerman <mpe@...erman.id.au>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Paul Mackerras <paulus@...ba.org>,
Heiko Carstens <hca@...ux.ibm.com>,
Vasily Gorbik <gor@...ux.ibm.com>,
Christian Borntraeger <borntraeger@...ibm.com>,
Thomas Gleixner <tglx@...utronix.de>,
Borislav Petkov <bp@...en8.de>, x86@...nel.org,
"H. Peter Anvin" <hpa@...or.com>,
Kees Cook <keescook@...omium.org>,
Anton Vorontsov <anton@...msg.org>,
Colin Cross <ccross@...roid.com>,
Tony Luck <tony.luck@...el.com>,
Joe Lawrence <joe.lawrence@...hat.com>,
Kamalesh Babulal <kamalesh@...ux.vnet.ibm.com>,
Mauro Carvalho Chehab <mchehab+huawei@...nel.org>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
linux-doc@...r.kernel.org, linux-csky@...r.kernel.org,
linux-parisc@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
linux-s390@...r.kernel.org, live-patching@...r.kernel.org
Subject: Re: [PATCH 11/11 v2] ftrace: Add recording of functions that caused
recursion
On Mon, 2 Nov 2020 12:37:21 -0500
Steven Rostedt <rostedt@...dmis.org> wrote:
> The only race that I see that can happen, is the one in the comment I
> showed. And that is after enabling the recursed functions again after
> clearing, one CPU could add a function while another CPU that just added
> that same function could be just exiting this routine, notice that a
> clearing of the array happened, and remove its function (which was the same
> as the one just happened). So we get a "zero" in the array. If this
> happens, it is likely that that function will recurse again and will be
> added later.
>
Updated version of this function:
-- Steve
void ftrace_record_recursion(unsigned long ip, unsigned long parent_ip)
{
int index = 0;
int i;
unsigned long old;
again:
/* First check the last one recorded */
if (ip == cached_function)
return;
i = atomic_read(&nr_records);
/* nr_records is -1 when clearing records */
smp_mb__after_atomic();
if (i < 0)
return;
/*
* If there's two writers and this writer comes in second,
* the cmpxchg() below to update the ip will fail. Then this
* writer will try again. It is possible that index will now
* be greater than nr_records. This is because the writer
* that succeeded has not updated the nr_records yet.
* This writer could keep trying again until the other writer
* updates nr_records. But if the other writer takes an
* interrupt, and that interrupt locks up that CPU, we do
* not want this CPU to lock up due to the recursion protection,
* and have a bug report showing this CPU as the cause of
* locking up the computer. To not lose this record, this
* writer will simply use the next position to update the
* recursed_functions, and it will update the nr_records
* accordingly.
*/
if (index < i)
index = i;
if (index >= CONFIG_FTRACE_RECORD_RECURSION_SIZE)
return;
for (i = index - 1; i >= 0; i--) {
if (recursed_functions[i].ip == ip) {
cached_function = ip;
return;
}
}
cached_function = ip;
/*
* We only want to add a function if it hasn't been added before.
* Add to the current location before incrementing the count.
* If it fails to add, then increment the index (save in i)
* and try again.
*/
old = cmpxchg(&recursed_functions[index].ip, 0, ip);
if (old != 0) {
/* Did something else already added this for us? */
if (old == ip)
return;
/* Try the next location (use i for the next index) */
index++;
goto again;
}
recursed_functions[index].parent_ip = parent_ip;
/*
* It's still possible that we could race with the clearing
* CPU0 CPU1
* ---- ----
* ip = func
* nr_records = -1;
* recursed_functions[0] = 0;
* i = -1
* if (i < 0)
* nr_records = 0;
* (new recursion detected)
* recursed_functions[0] = func
* cmpxchg(recursed_functions[0],
* func, 0)
*
* But the worse that could happen is that we get a zero in
* the recursed_functions array, and it's likely that "func" will
* be recorded again.
*/
i = atomic_read(&nr_records);
smp_mb__after_atomic();
if (i < 0)
cmpxchg(&recursed_functions[index].ip, ip, 0);
else if (i <= index)
atomic_cmpxchg(&nr_records, i, index + 1);
}
Powered by blists - more mailing lists