[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20201124223917.795844-1-elavila@google.com>
Date: Tue, 24 Nov 2020 22:39:17 +0000
From: "J. Avila" <elavila@...gle.com>
To: rostedt@...dmis.org, mingo@...hat.com
Cc: gregkh@...uxfoundation.org, john.stultz@...aro.org,
linux-kernel@...r.kernel.org, kernel-team@...roid.com
Subject: Potential Issue in Tracing Ring Buffer
Hello,
In the ftrace logs we've collected internally, we have found that there are
situations where time seems to go backwards; this breaks userspace tools which
expect time to always go forward in these logs. For example, in this snippet
from a db845c running a 5.10-rc4 kernel[1] (thanks for getting us the trace,
John!), we see:
android.bg-568 [002] dN.1 26247: rcu_utilization: Start context switch
<...>-589 [003] .... 26247: sys_enter: NR 73 (b400006f4898c520, 2, 0, 0, 0, 11)
PowerManagerSer-578 [001] d..2 26248: sched_switch: prev_comm=PowerManagerSer prev_pid=578 prev_prio=116 prev_state=S ==> next_comm=Binder:381_2 next_pid=395 next_prio=120
android.bg-568 [002] dN.1 26248: rcu_utilization: End context switch
<...>-589 [003] d..1 26249: rcu_utilization: Start context switch
<...>-589 [003] d..1 26250: rcu_utilization: End context switch
Binder:381_2-395 [001] .... 26251: sys_exit: NR 98 = 0
android.bg-568 [002] dN.2 26251: sched_stat_wait: comm=kworker/u16:3 pid=164 delay=0 [ns]
<...>-589 [003] d..2 26252: sched_stat_runtime: comm=statsd.writer pid=589 runtime=611094 [ns] vruntime=24367153868 [ns]
android.bg-568 [002] d..2 26254: sched_switch: prev_comm=android.bg prev_pid=568 prev_prio=130 prev_state=R+ ==> next_comm=kworker/u16:3 next_pid=164 next_prio=120
kworker/u16:3-164 [002] .... 26256: workqueue_execute_start: work struct 00000000ab3fe95e: function ufshcd_clk_scaling_resume_work
Binder:381_2-395 [001] .... 26257: sys_enter: NR 98 (b400006fb8984bf0, 89, 185c, 0, 0, ffffffff)
Binder:381_2-395 [001] d..1 26258: rcu_utilization: Start context switch
kworker/u16:3-164 [002] .... 26258: workqueue_execute_end: work struct 00000000ab3fe95e: function ufshcd_clk_scaling_resume_work
Binder:381_2-395 [001] d..1 26260: rcu_utilization: End context switch
kworker/u16:3-164 [002] d..1 26261: rcu_utilization: Start context switch
Binder:381_2-395 [001] d..2 26262: sched_stat_runtime: comm=Binder:381_2 pid=395 runtime=100989 [ns] vruntime=40144094287 [ns]
kworker/u16:3-164 [002] d..1 26263: rcu_utilization: End context switch
kworker/u16:3-164 [002] d..2 26237: sched_stat_runtime: comm=kworker/u16:3 pid=164 runtime=71614 [ns] vruntime=32851170186 [ns]
kworker/u16:3-164 [002] d..2 26240: sched_stat_wait: comm=android.bg pid=568 delay=99635 [ns]
kworker/u16:3-164 [002] d..2 26241: sched_switch: prev_comm=kworker/u16:3 prev_pid=164 prev_prio=120 prev_state=I ==> next_comm=android.bg next_pid=568 next_prio=130
In this trace, we switched the trace clock to counter to rule out hardware
issues, but this has been seen with default settings as well. This is
consistently reproducible - we see it regularly when collecting any trace for
~60 seconds. This seems like it could be a problem in the upstream code, as this
kernel had no modifications made to the ftrace code. Is this a known problem in
upstream? Is there any additional information I could provide to verify this is
an upstream issue? Is anyone else observing this behavior, and if so, do they
happen to be working on a fix? If not, do you have any pointers on how we can
address this?
Thanks,
Avila
[1] Taken from https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline
Powered by blists - more mailing lists