lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAMuHMdURCeAVt_2L33P197qbj3UBXLWRZH0nZvm+UJbnzBCS2A@mail.gmail.com>
Date:   Tue, 20 Sep 2022 20:15:19 +0200
From:   Geert Uytterhoeven <geert@...ux-m68k.org>
To:     Jesse Brandeburg <jesse.brandeburg@...el.com>,
        Tony Nguyen <anthony.l.nguyen@...el.com>,
        Miroslav Lichvar <mlichvar@...hat.com>
Cc:     "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        intel-wired-lan@...ts.osuosl.org, netdev <netdev@...r.kernel.org>,
        Linux-Renesas <linux-renesas-soc@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: E1000e PTP crash on R-Car Gen2

Hi all,

While leaving a Renesas Koelsch development board (with R-Car M2-W SoC)
and an otherwise unused Intel E1000e Ethernet card running unattended, I
ran into a crash after 4 hours and 5 minutes of uptime:

    Unhandled fault: asynchronous external abort (0x1211) at 0x00000000
    [00000000] *pgd=80000040004003, *pmd=00000000
    Internal error: : 1211 [#1] SMP ARM
    Modules linked in:
    CPU: 0 PID: 581 Comm: kworker/0:0 Tainted: G                 N
6.0.0-rc6-koelsch-00864-g34666b5da80f #1661
    Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
    Workqueue: events e1000e_systim_overflow_work
    PC is at e1000e_read_systim+0x3c/0x1c0
    LR is at timecounter_read+0x14/0xa0

    [...]

     e1000e_read_systim from timecounter_read+0x14/0xa0
     timecounter_read from e1000e_systim_overflow_work+0x24/0x7c
     e1000e_systim_overflow_work from process_one_work+0x2f0/0x4c4
     process_one_work from worker_thread+0x240/0x2d0
     worker_thread from kthread+0xd0/0xe0
     kthread from ret_from_fork+0x14/0x34

    [...]

    BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 39s!

    [...]

This happened when checking if the time counter overflowed, which is done
from a workqueue periodically (E1000_SYSTIM_OVERFLOW_PERIOD = 4 hours).
The asynchronous external abort is a typical symptom of accessing a
device's hardware registers (in this case the PCIe controller) while the
device's clock is disabled, so presumably the workqueue ran while the
device was runtime-suspended.

I don't know much about how and when Linux uses PTP, but I did notice
drivers/net/ethernet/intel/e1000e/netdev.c makes several pm_runtime_*()
calls (but not in e1000e_read_systim()), while
drivers/net/ethernet/intel/e1000e/ptp.c makes none.

Unfortunately I haven't managed to reproduce the problem (even with
E1000_SYSTIM_OVERFLOW_PERIOD reduced), so probably there is a race
condition somewhere.

Thanks for your comments!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@...ux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ