netdev - [BUG] net: stmmac: crash within stmmac

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABnpCuCLN6VNgmoWHwc4_8AT34xqmQnEoUHLncvE2yLqYZBaKg@mail.gmail.com>
Date: Mon, 19 Aug 2024 13:26:37 +0100
From: Shane Francis <bigbeeshane@...il.com>
To: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org, 
	pabeni@...hat.com, mcoquelin.stm32@...il.com
Cc: linux-arm-kernel@...ts.infradead.org, netdev@...r.kernel.org
Subject: [BUG] net: stmmac: crash within stmmac_rx()

Summary of the problem:
===================
Crash observed within stmmac_rx when under high RX demand

Hardware : Rockchip RK3588 platform with an RTL8211F NIC

the issue seems identical to the one described here :
https://lore.kernel.org/netdev/20210514214927.GC1969@qmqm.qmqm.pl/T/

Full description of the problem/report:
=============================
I have observed that when under high upload scenarios the stmmac
driver will crash due to what I think is an overflow error, after some
debugging I found that stmmac_rx_buf2_len() is returning an
unexpectedly high value and assigning to buf2_len here
https://github.com/torvalds/linux/blob/v6.6/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c#L5466

an example value set that i have observed to causes the crash :
    buf1_len = 0
    buf2_len = 4294966330

from within the stmmac_rx_buf2_len function
    plen = 2106
    len = 3072

the return value would be plen-len or -966 (4294966330 as a uint32
that matches the buf2_len)

I am unsure on how to debug this further, would clamping
stmmac_rx_buf2_len function to return the dma_buf_sz if the return
value would have otherwise exceeded it ?

This only happens when exceeding 500mbps upload speeds, I have been
unable to replicate the issue when limiting the speed to sub 500mbps


Kernel version (from /proc/version):
===========================
6.6.45


Crash Log
========
[  120.746602] Mem abort info:
[  120.746848]   ESR = 0x000000009600014f
[  120.747189]   EC = 0x25: DABT (current EL), IL = 32 bits
[  120.747668]   SET = 0, FnV = 0
[  120.747943]   EA = 0, S1PTW = 0
[  120.748225]   FSC = 0x0f: level 3 permission fault
[  120.748650] Data abort info:
[  120.748908]   ISV = 0, ISS = 0x0000014f, ISS2 = 0x00000000
[  120.749392]   CM = 1, WnR = 1, TnD = 0, TagAccess = 0
[  120.749835]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  120.750311] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000003ddd000
[  120.750902] [ffff000003210000] pgd=18000001ffff8003,
p4d=18000001ffff8003, pud=18000001ffff7003, pmd=18000001fffde003,
pte=0060000003210783
[  120.752014] Internal error: Oops: 000000009600014f [#1] PREEMPT SMP
[  120.752562] Modules linked in: pppoe ppp_async nft_fib_inet
nf_flow_table_inet pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4
nft_reject_inet nft_reject nft_redir nft_quota nft_numgen nft_nat
nft_masq nft_log nft_limit nft_hash nft_flow_offload nft
_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_chain_nat nf_tables nf_nat
nf_flow_table nf_conntrack slhc r8169 nfnetlink nf_reject_ipv6
nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 crc_ccitt
gpio_button_hotplug(O)
[  120.756247] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G           O
   6.6.45 #0
[  120.756894] Hardware name: FriendlyElec NanoPi R6S (DT)
[  120.757351] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  120.757959] pc : dcache_inval_poc+0x40/0x58
[  120.758331] lr : arch_sync_dma_for_cpu+0x2c/0x3c
[  120.758739] sp : ffff80008000bcf0
[  120.759030] x29: ffff80008000bcf0 x28: ffff0001018e8900 x27: ffff000104920000
[  120.759657] x26: 0000000000000000 x25: ffff000103d28500 x24: ffff0001018e8900
[  120.760284] x23: ffff0001018ec900 x22: 00000000fffffc36 x21: 0000000000000002
[  120.760910] x20: ffff000100bf6410 x19: 0000000000589000 x18: 0000000000000000
[  120.761537] x17: 1128298ef1fd0a08 x16: 01010000efc30001 x15: ffffffffffffffff
[  120.762164] x14: ffffffffffffffff x13: ffffffffffffffff x12: ffffffffffffffff
[  120.762790] x11: ffffffffffffffff x10: ffffffffffffffff x9 : ffffffffffffffff
[  120.763417] x8 : ffffffffffffffff x7 : 0000000000000640 x6 : dead00000000003f
[  120.764043] x5 : 0000000000000001 x4 : 0000000000000000 x3 : 000000000000003f
[  120.764670] x2 : 0000000000000040 x1 : ffff000100588c00 x0 : ffff000003210000
[  120.765296] Call trace:
[  120.765512]  dcache_inval_poc+0x40/0x58
[  120.765849]  dma_sync_single_for_cpu+0xec/0x110
[  120.766250]  stmmac_napi_poll_rx+0x30c/0xd9c
[  120.766628]  __napi_poll+0x38/0x178
[  120.766939]  net_rx_action+0x114/0x23c
[  120.767270]  handle_softirqs+0x108/0x248
[  120.767617]  __do_softirq+0x14/0x20
[  120.767926]  ____do_softirq+0x10/0x1c
[  120.768249]  call_on_irq_stack+0x24/0x4c
[  120.768594]  do_softirq_own_stack+0x1c/0x28
[  120.768963]  irq_exit_rcu+0xbc/0xd8
[  120.769272]  el1_interrupt+0x38/0x68
[  120.769590]  el1h_64_irq_handler+0x18/0x24
[  120.769951]  el1h_64_irq+0x68/0x6c
[  120.770251]  cpuidle_enter_state+0x130/0x2f0
[  120.770625]  cpuidle_enter+0x38/0x50
[  120.770941]  do_idle+0x19c/0x1f0
[  120.771229]  cpu_startup_entry+0x38/0x3c
[  120.771575]  __cpu_disable+0x0/0xdc
[  120.771883]  __secondary_switched+0xb8/0xbc
[  120.772255] Code: 8a230000 54000060 d50b7e20 14000002 (d5087620)
[  120.772787] ---[ end trace 0000000000000000 ]---
[  120.773192] Kernel panic - not syncing: Oops: Fatal exception in interrupt
[  120.773790] SMP: stopping secondary CPUs
[  120.774203] Kernel Offset: disabled
[  120.774507] CPU features: 0x0,c0000000,70028141,1000700b
[  120.774971] Memory Limit: none