[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAMuHMdURCeAVt_2L33P197qbj3UBXLWRZH0nZvm+UJbnzBCS2A@mail.gmail.com>
Date: Tue, 20 Sep 2022 20:15:19 +0200
From: Geert Uytterhoeven <geert@...ux-m68k.org>
To: Jesse Brandeburg <jesse.brandeburg@...el.com>,
Tony Nguyen <anthony.l.nguyen@...el.com>,
Miroslav Lichvar <mlichvar@...hat.com>
Cc: "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
intel-wired-lan@...ts.osuosl.org, netdev <netdev@...r.kernel.org>,
Linux-Renesas <linux-renesas-soc@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: E1000e PTP crash on R-Car Gen2
Hi all,
While leaving a Renesas Koelsch development board (with R-Car M2-W SoC)
and an otherwise unused Intel E1000e Ethernet card running unattended, I
ran into a crash after 4 hours and 5 minutes of uptime:
Unhandled fault: asynchronous external abort (0x1211) at 0x00000000
[00000000] *pgd=80000040004003, *pmd=00000000
Internal error: : 1211 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 581 Comm: kworker/0:0 Tainted: G N
6.0.0-rc6-koelsch-00864-g34666b5da80f #1661
Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
Workqueue: events e1000e_systim_overflow_work
PC is at e1000e_read_systim+0x3c/0x1c0
LR is at timecounter_read+0x14/0xa0
[...]
e1000e_read_systim from timecounter_read+0x14/0xa0
timecounter_read from e1000e_systim_overflow_work+0x24/0x7c
e1000e_systim_overflow_work from process_one_work+0x2f0/0x4c4
process_one_work from worker_thread+0x240/0x2d0
worker_thread from kthread+0xd0/0xe0
kthread from ret_from_fork+0x14/0x34
[...]
BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 39s!
[...]
This happened when checking if the time counter overflowed, which is done
from a workqueue periodically (E1000_SYSTIM_OVERFLOW_PERIOD = 4 hours).
The asynchronous external abort is a typical symptom of accessing a
device's hardware registers (in this case the PCIe controller) while the
device's clock is disabled, so presumably the workqueue ran while the
device was runtime-suspended.
I don't know much about how and when Linux uses PTP, but I did notice
drivers/net/ethernet/intel/e1000e/netdev.c makes several pm_runtime_*()
calls (but not in e1000e_read_systim()), while
drivers/net/ethernet/intel/e1000e/ptp.c makes none.
Unfortunately I haven't managed to reproduce the problem (even with
E1000_SYSTIM_OVERFLOW_PERIOD reduced), so probably there is a race
condition somewhere.
Thanks for your comments!
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@...ux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
Powered by blists - more mailing lists