[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20191104211929.681537321@linuxfoundation.org>
Date: Mon, 4 Nov 2019 22:44:52 +0100
From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To: linux-kernel@...r.kernel.org
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
stable@...r.kernel.org, Petr Mladek <pmladek@...e.com>,
"Steven Rostedt (VMware)" <rostedt@...dmis.org>,
Sasha Levin <sashal@...nel.org>
Subject: [PATCH 4.9 30/62] tracing: Initialize iter->seq after zeroing in tracing_read_pipe()
From: Petr Mladek <pmladek@...e.com>
[ Upstream commit d303de1fcf344ff7c15ed64c3f48a991c9958775 ]
A customer reported the following softlockup:
[899688.160002] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [test.sh:16464]
[899688.160002] CPU: 0 PID: 16464 Comm: test.sh Not tainted 4.12.14-6.23-azure #1 SLE12-SP4
[899688.160002] RIP: 0010:up_write+0x1a/0x30
[899688.160002] Kernel panic - not syncing: softlockup: hung tasks
[899688.160002] RIP: 0010:up_write+0x1a/0x30
[899688.160002] RSP: 0018:ffffa86784d4fde8 EFLAGS: 00000257 ORIG_RAX: ffffffffffffff12
[899688.160002] RAX: ffffffff970fea00 RBX: 0000000000000001 RCX: 0000000000000000
[899688.160002] RDX: ffffffff00000001 RSI: 0000000000000080 RDI: ffffffff970fea00
[899688.160002] RBP: ffffffffffffffff R08: ffffffffffffffff R09: 0000000000000000
[899688.160002] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b59014720d8
[899688.160002] R13: ffff8b59014720c0 R14: ffff8b5901471090 R15: ffff8b5901470000
[899688.160002] tracing_read_pipe+0x336/0x3c0
[899688.160002] __vfs_read+0x26/0x140
[899688.160002] vfs_read+0x87/0x130
[899688.160002] SyS_read+0x42/0x90
[899688.160002] do_syscall_64+0x74/0x160
It caught the process in the middle of trace_access_unlock(). There is
no loop. So, it must be looping in the caller tracing_read_pipe()
via the "waitagain" label.
Crashdump analyze uncovered that iter->seq was completely zeroed
at this point, including iter->seq.seq.size. It means that
print_trace_line() was never able to print anything and
there was no forward progress.
The culprit seems to be in the code:
/* reset all but tr, trace, and overruns */
memset(&iter->seq, 0,
sizeof(struct trace_iterator) -
offsetof(struct trace_iterator, seq));
It was added by the commit 53d0aa773053ab182877 ("ftrace:
add logic to record overruns"). It was v2.6.27-rc1.
It was the time when iter->seq looked like:
struct trace_seq {
unsigned char buffer[PAGE_SIZE];
unsigned int len;
};
There was no "size" variable and zeroing was perfectly fine.
The solution is to reinitialize the structure after or without
zeroing.
Link: http://lkml.kernel.org/r/20191011142134.11997-1-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@...e.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@...dmis.org>
Signed-off-by: Sasha Levin <sashal@...nel.org>
---
kernel/trace/trace.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 827ba2caea097..6a170a78b4535 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5217,6 +5217,7 @@ waitagain:
sizeof(struct trace_iterator) -
offsetof(struct trace_iterator, seq));
cpumask_clear(iter->started);
+ trace_seq_init(&iter->seq);
iter->pos = -1;
trace_event_read_lock();
--
2.20.1
Powered by blists - more mailing lists