[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <6243f7cc-1a80-db0b-4765-fa12bda9b06a@comcast.net>
Date: Sun, 23 Sep 2018 16:07:12 -0400
From: Rob Prowel <rprowel@...cast.net>
To: linux-kernel@...r.kernel.org
Subject: AMD Athlon bogus performance value causing RCU stalls?
Please CC me on comments.
I'm seeing a lot of these errors on my dual core fileserver:
-----------------------------------------------------------------------
Sep 23 01:51:28 files kernel: INFO: rcu_sched detected stalls on CPUs/tasks:
Sep 23 01:51:28 files kernel: 1-...!: (0 ticks this GP) idle=27c/0/0 softirq=35425/35425 fqs=0
Sep 23 01:51:28 files kernel: (detected by 0, t=60009 jiffies, g=20812, c=20811, q=121)
Sep 23 01:51:28 files kernel: Sending NMI from CPU 0 to CPUs 1:
Sep 23 01:51:28 files kernel: NMI backtrace for cpu 1 skipped: idling at native_safe_halt+0x2/0x10
Sep 23 01:51:28 files kernel: rcu_sched kthread starved for 60009 jiffies! g20812 c20811 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1
Sep 23 01:51:28 files kernel: RCU grace-period kthread stack dump:
Sep 23 01:51:28 files kernel: rcu_sched I 0 10 2 0x80000000
Sep 23 01:51:33 files kernel: Call Trace:
Sep 23 01:51:33 files kernel: ? __schedule+0x25c/0x860
Sep 23 01:51:33 files kernel: schedule+0x28/0x80
Sep 23 01:51:33 files kernel: schedule_timeout+0x174/0x370
Sep 23 01:51:33 files kernel: ? __next_timer_interrupt+0xc0/0xc0
Sep 23 01:51:33 files kernel: rcu_gp_kthread+0x4b6/0x8c0
Sep 23 01:51:33 files kernel: ? _synchronize_rcu_expedited.constprop.68+0x310/0x310
Sep 23 01:51:33 files kernel: kthread+0x113/0x130
Sep 23 01:51:33 files kernel: ? kthread_create_worker_on_cpu+0x70/0x70
Sep 23 01:51:33 files kernel: ret_from_fork+0x35/0x40
-----------------------------------------------------------------------
The kernel reported bogoMIPS for the cores are as follows:
$ grep bogo /proc/cpuinfo
bogomips : 4219.49
bogomips : 184253.06
$
What is that value for the second Athlon core (seems extremely bogus), and would/could that be the reason for the schedule_timeouts? This bogus value also shows up in the bootup log when the second core is activated. Seems to be AMD specific, as the values are correct on my Xeon machines.
Kernel is a stock Fedora 4.18.7-100 release. Machine is an old Dell Experion that I've repurposed as a fileserver and postgresql machine.
Other than RTFM, or please build a bunch of kernels from source on your slow machine, using differing config options to help track down the cause of this...any thoughts about a solution?
Powered by blists - more mailing lists