lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <fa12d66e-de52-3e2e-154c-90c775bb4fe4@ametek.com>
Date:   Mon, 14 Sep 2020 13:53:05 +0000
From:   James Jurack <James.Jurack@...tek.com>
To:     "claudiu.manoil@...escale.com" <claudiu.manoil@...escale.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: PROBLEM: Crash when timestamping outgoing PTP packets under heavy
 network load (ppc, gianfar)

[1.] One line summary of the problem:
Crash when timestamping outgoing PTP packets under heavy network load 
(ppc, gianfar)

[2.] Full description of the problem/report:
I have a custom embedded platform running a QorIQ P2020 PPC processor 
and Freescale eTSEC for gigabit Ethernet. When I have heavy network load 
(full gigabit Ethernet usage), and send PTP Event packets with gianfar’s 
hardware timestamping enabled, memory/stack corruption seems to occur, 
leading to many different symptoms, including DMA API Debug warnings, 
CPU stalls, and oopses/panics due to null pointer dereference or invalid 
page access. The crash can also happen under lower load, but below 
100MBps it can take hours for the issue to occur.

[3.] Keywords (i.e., modules, networking, kernel):
networking, ptp, ieee 1588, hardware timestamping, gianfar, ppc, p2020

[4.] Kernel information
[4.1.] Kernel version (from /proc/version):
# uname -a
Linux version 4.4.235 (jjurack@...val-eng-12) (gcc version 4.9.3 
(crosstool-NG crosstool-ng-1.22.0) ) #13 SMP Thu Sep 10 08:28:37 EDT 2020

[4.2.] Kernel .config file:
linux.config is attached

[5.] Most recent kernel version which did not have the bug:
The latest version of this system to not have this crash was using 
kernel 3.2 (Linux morlun 3.2.0 #1 SMP Fri Jun 10 12:43:28 PDT 2016 ppc 
GNU/Linux). However, many other parts of our system have changed since 
then and I have so far not been able to create modified versions of that 
build for 1:1 comparison testing. I have gone back as far as 4.4.129 on 
the 4.4 branch; that is where I first discovered the issue.

[6.] Output of Oops.. message (if applicable) with symbolic information
      resolved (see Documentation/oops-tracing.txt)
Console logs from a few different crashes are attached (console{1..5}.log)

[7.] A small shell script or example program which triggers the
      problem (if possible)
I have attached minimal programs that I used to generate network load 
and send timestamped PTP packets:
* netdump.c generates network traffic.
* dump.sh runs 16 instances of netdump.
* test.py connects to the netdump instances from another system to 
consume the traffic.
* hwts.c sends timestamped PTP packets.
To reproduce I usually do these steps:
  * run dump.sh
  * start test.py on another system
  * run `hwts 32` to send 32 ptp packets
  * a crash usually happens within 10-20 seconds

[8.] Environment
[8.1.] Software (add the output of the ver_linux script here)
$ scripts/ver_linux
If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.

Linux oh-val-eng-12 5.8.5-arch1-1 #1 SMP PREEMPT Thu, 27 Aug 2020 
18:53:02 +0000 x86_64 GNU/Linux

GNU C                   10.2.0
GNU Make                4.3
Binutils                2.35
Util-linux              2.36
Mount                   2.36
Module-init-tools       27
E2fsprogs               1.45.6
Jfsutils                1.1.15
Reiserfsprogs           3.6.27
Xfsprogs                5.7.0
Pcmciautils             018
PPP                     2.4.7
Linux C Library         2.32
Dynamic linker (ldd)    2.32
Linux C++ Library       6.0.28
Net-tools               2.10
Kbd                     2.3.0
Console-tools           2.3.0
Sh-utils                8.32
Udev                    246
Wireless-tools          30
Modules Loaded          acpi_cpufreq agpgart at24 auth_rpcgss bluetooth 
cdrom cec cfg80211 coretemp crc16 crc32c_generic crc32c_intel 
crypto_user dcdbas dell_smbios dell_wmi dell_wmi_descriptor drm 
drm_kms_helper e1000e ecc ecdh_generic ehci_hcd ehci_pci evdev ext4 
fb_sys_fops fuse gpio_ich grace hid hid_generic hid_plantronics 
i2c_algo_bit i2c_i801 i2c_smbus i7core_edac input_leds intel_cstate 
intel_pmc_bxt intel_powerclamp intel_uncore ip_tables irqbypass 
iTCO_vendor_support iTCO_wdt jbd2 joydev kvm kvm_intel ledtrig_audio 
lockd loop lpc_ich mac_hid mbcache mc mousedev nfnetlink nfnetlink_log 
nfnetlink_queue nfs_acl nfsd parport parport_pc pcspkr ppdev radeon 
rc_core rfkill sg snd snd_hda_codec snd_hda_codec_generic 
snd_hda_codec_realtek snd_hda_core snd_hda_intel snd_hwdep 
snd_intel_dspcfg snd_pcm snd_rawmidi snd_seq_device snd_timer 
snd_usb_audio snd_usbmidi_lib soundcore sparse_keymap sr_mod sunrpc 
syscopyarea sysfillrect sysimgblt ttm uas usbhid usb_storage vboxdrv 
vboxnetadp vboxnetflt wmi wmi_bmof x_tables

[8.2.] Processor information (from /proc/cpuinfo):
# cat /proc/cpuinfo
processor       : 0
cpu             : e500v2
clock           : 1200.000000MHz
revision        : 5.1 (pvr 8021 1051)
bogomips        : 150.00

processor       : 1
cpu             : e500v2
clock           : 1200.000000MHz
revision        : 5.1 (pvr 8021 1051)
bogomips        : 150.00

total bogomips  : 300.00
timebase        : 75000000
platform        : P2020 RDB
model           : EMX-2500
Memory          : 512 MB

[8.3.] Module information (from /proc/modules):
# cat /proc/modules

[8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem)

# cat /proc/ioports
00000000-0000ffff : /pcie@...08000
   00000000-00000fff : Legacy IO
00020000-0002ffff : /pcie@...09000
   00020000-0002ffff : PCI Bus 0001:12
00040000-0004ffff : /pcie@...0a000
   00040000-0004ffff : PCI Bus 0002:17
# cat /proc/iomem
00000000-1fffffff : System RAM
80000000-bfffffff : /pcie@...0a000
   80000000-bfffffff : PCI Bus 0002:17
     a0000000-bfffffff : /pcie@...09000
       a0000000-bfffffff : PCI Bus 0001:12
e0000000-ffffffff : /pcie@...08000
   e0000000-ffffffff : PCI Bus 0000:01
     ff704500-ff704507 : serial
     ff707000-ff707fff : /soc@...00000/spi@...0
     ff722000-ff722fff : /soc@...00000/usb@...00
       ff722000-ff722fff : /soc@...00000/usb@...00
         ff722000-ff722fff : /soc@...00000/usb@...00

[8.5.] PCI information ('lspci -vvv' as root)
lspci.log is attached

[8.6.] SCSI information (from /proc/scsi/scsi)
# cat /proc/scsi/scsi

[8.7.] Other information that might be relevant to the problem
        (please look in /proc and include all information that you
        think to be relevant):
I’ve attached our system’s device tree source file (emx-2500.dts).

[X.] Other notes, patches, fixes, workarounds:
[X.1.] I have tried commenting gianfar.c:965 (priv->hwts_tx_en = 1;). 
This causes the crash to disappear.
[X.2.] Just calling the SIOCSHWTSTAMP ioctl to turn on hardware 
timestamping while under network load causes a DMA API Debug warning, 
but does not seem to destabilize the system otherwise. I’m not sure if 
this is the same issue or separate. Examples of this are included in 
each console log, as well as below:
fsl-gianfar ff724000.ethernet: DMA-API: device driver frees DMA memory 
with wrong function [device address=0x000000001ed90000] [size=232 bytes] 
[mapped as page] [unmapped as single]
------------[ cut here ]------------
WARNING: at lib/dma-debug.c:1116
Modules linked in:
CPU: 1 PID: 1589 Comm: hwts Tainted: G        W       4.4.235 #13
task: d9e41900 ti: db960000 task.ti: db960000
NIP: c030f5c0 LR: c030f5c0 CTR: c0367c44
REGS: db961c20 TRAP: 0700   Tainted: G        W        (4.4.235)
MSR: 00021000 <CE,ME>  CR: 28002822  XER: 20000000

GPR00: c030f5c0 db961cd0 d9e41900 000000b5 dffd12f0 dffd2e0c 1f85f000 
db960000
GPR08: 00000007 c0772d4c 1f85f000 00000297 42002884 100192ac 00000000 
00000000
GPR16: 00000000 d9da443c 00000002 d9da4420 00000000 df980900 00000020 
d9da4000
GPR24: 00029000 c07a8314 c07fe728 c07b0000 c07d9ec0 db961d28 c0808e20 
d9ddbd20
NIP [c030f5c0] check_unmap+0x948/0xa90
LR [c030f5c0] check_unmap+0x948/0xa90
Call Trace:
[db961cd0] [c030f5c0] check_unmap+0x948/0xa90 (unreliable)
[db961d20] [c030f7a4] debug_dma_unmap_page+0x9c/0xb0
[db961da0] [c03eeb70] free_skb_resources+0xf4/0x3e4
[db961df0] [c03f354c] reset_gfar+0x68/0x9c
[db961e00] [c03f378c] gfar_ioctl+0x20c/0x210
[db961e30] [c04a2d14] dev_ifsioc+0x308/0x31c
[db961e60] [c04a2f94] dev_ioctl+0x1c0/0x624
[db961ec0] [c014b4d0] do_vfs_ioctl+0x38c/0x6b4
[db961f20] [c014b844] SyS_ioctl+0x4c/0x80
[db961f40] [c0011004] ret_from_syscall+0x0/0x3c
--- interrupt: c01 at 0xff40194
     LR = 0xffed0a8
Instruction dump:
554a103a 7c69402e 7cc9502e 811d001c 813d0020 815d0024 90610008 3c60c06b
90c1000c 3863b7a8 4cc63182 482c99b9 <0fe00000> 4bfffa60 3c80c06b 3884b0f8
---[ end trace 2398b56cb968a2e0 ]---
Mapped at:
[<c03f00dc>] gfar_start_xmit+0x888/0x9f0
[<c0489b74>] dev_hard_start_xmit+0x27c/0x47c
[<c04ae520>] sch_direct_xmit+0xe4/0x278
[<c04ae748>] __qdisc_run+0x94/0x1dc
[<c048a22c>] __dev_queue_xmit+0x384/0x70c

---
James Jurack
Systems Engineer
VTI Instruments / Ametek Programmable Power

View attachment "console1.log" of type "text/x-log" (9883 bytes)

View attachment "console2.log" of type "text/x-log" (5893 bytes)

View attachment "console3.log" of type "text/x-log" (12284 bytes)

View attachment "console4.log" of type "text/x-log" (6726 bytes)

View attachment "console5.log" of type "text/x-log" (10439 bytes)

Download attachment "dump.sh" of type "application/x-shellscript" (147 bytes)

Download attachment "emx-2500.dts" of type "audio/vnd.dts" (16685 bytes)

View attachment "hwts.c" of type "text/x-csrc" (4326 bytes)

View attachment "linux.config" of type "text/plain" (70323 bytes)

View attachment "lspci.log" of type "text/x-log" (156353 bytes)

View attachment "netdump.c" of type "text/x-csrc" (2202 bytes)

View attachment "test.py" of type "text/x-python" (771 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ