lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8bea4eed-5b71-9fd4-c705-926bdad0ee47@camlingroup.com>
Date:   Thu, 15 Sep 2022 11:56:24 +0200
From:   Lech Perczak <lech.perczak@...lingroup.com>
To:     Jérôme Pouiller <jerome.pouiller@...abs.com>,
        linux-wireless@...r.kernel.org, netdev@...r.kernel.org,
        Paweł Lenkow <pawel.lenkow@...lingroup.com>
CC:     Kalle Valo <kvalo@...nel.org>,
        Krzysztof Drobiński 
        <krzysztof.drobinski@...lingroup.com>,
        Kirill Yatsenko <kirill.yatsenko@...lingroup.com>
Subject: Re: wfx: Memory corruption during high traffic with WFM200 on i.MX6Q
 platform

Hi Jérôme,

Just a quick note, so you don't have to redo our work - Paweł found the root cause,
patch is coming very shortly.

TL;DR is that hw->max_rates in wfx_init_common was set to 8 initially,
which is over the maximum of 4 specified by mac80211,
causing out-of-bounds writes all over the place.

Kind regards,
Lech

W dniu 12.09.2022 o 18:46, Lech Perczak pisze:
> Hi Jérôme,
>
> Probably a Thunderbird mess-up. Let's try again, I hope it works - I probably fiddled too much with the settings to make it send plain-text.
>
> We're trying to get a WFM200S022XNN3 module working on a custom i.MX6Q board using SDIO interface, using upstream kernel.
> Our patches concern primarily the device tree for the board - and upstream firmware from linux-firmware repository.
>
> During that, we stumbled upon a memory corruption issue, which appears when big traffic is passing through the device.
> Our adapter is running in AP mode. This can be reproduced with 100% rate using iperf3,
> by starting an AP interface on the device, and an iperf3 server.
> Then, the client station runs iperf3 with "iperf3 -c <hostname> -t 3600" command - so the AP is sending data for up to one hour,
> however - the kernel on our device crashes after around a few minutes of traffic, sometimes less than a minute.
>
> The behaviour is the same on kernel v5.19.7, v5.19.2, and even with v6.0-rc5. Tests on v6.0-rc5 have shown most detailed stacktrace so far:
>
> 8<--- cut here ---
> Unable to handle kernel NULL pointer dereference at virtual address 00000101
> [00000101] *pgd=00000000
> Internal error: Oops: 17 [#1] PREEMPT SMP ARM
> Modules linked in: xt_LOG nf_log_syslog xt_limit iptable_mangle xt_connmark xt_tcpudp xt_conntrack
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables cdc_mbim cdc_wdm cdc_ncm
> cdc_ether usbnet cdc_acm usb_serial_simple usbserial usb_f_rndis u_ether wfx mac80211 libarc4 cfg80211 evbug
> phy_generic ci_hdrc_imx ci_hdrc adt7475 hwmon_vid ulpi roles usbmisc_imx pwm_imx27
> pwm_beeper libcomposite configfs udc_core
> CPU: 0 PID: 10 Comm: ksoftirqd/0 Not tainted 6.0.0-rc5-dnm3pv2+g047dc4cf9a10 #1
> Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> PC is at kfree_skb_list_reason+0x10/0x24
> LR is at ieee80211_report_used_skb+0xd0/0x5b4 [mac80211]
> pc : [<80773238>]    lr : [<7f136538>]    psr: 20000113
> sp : f0801e60  ip : 00000000  fp : 838f04e2
> r10: 00000001  r9 : 838f04e2  r8 : 00000000
> r7 : 82661580  r6 : 00000000  r5 : 82660580  r4 : 00000101
> r3 : 838f0700  r2 : 00000032  r1 : 00000001  r0 : 00000101
> Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
> Control: 10c5387d  Table: 11d0004a  DAC: 00000051
> Register r0 information: non-paged memory
> Register r1 information: non-paged memory
> Register r2 information: non-paged memory
> Register r3 information: slab kmalloc-1k start 838f0400 pointer offset 768 size 1024
> Register r4 information: non-paged memory
> Register r5 information: slab kmalloc-8k start 82660000 pointer offset 1408 size 8192
> Register r6 information: NULL pointer
> Register r7 information: slab kmalloc-8k start 82660000 pointer offset 5504 size 8192
> Register r8 information: NULL pointer
> Register r9 information: slab kmalloc-1k start 838f0400 pointer offset 226 size 1024
> Register r10 information: non-paged memory
> Register r11 information: slab kmalloc-1k start 838f0400 pointer offset 226 size 1024
> Register r12 information: NULL pointer
> Process ksoftirqd/0 (pid: 10, stack limit = 0x1fff5f96)
> Stack: (0xf0801e60 to 0xf0802000)
> 1e60: 8393cd80 7f136538 00000000 81590f34 80f050b4 20000193 f0801ecc 7f189a7c
> 1e80: 00000032 00000005 823f0458 f0801f18 81c51a00 8368504c 7f189854 83898000
> 1ea0: 8226ac40 40000210 00000200 80f04ec8 f17ddddc 00000000 f0801f18 82660580
> 1ec0: 8393cd80 00000000 00000000 8393cd98 838f04e2 7f13791c 00000000 00000000
> 1ee0: 82660580 00004288 00000000 838f04e2 82660580 8393cd98 82660580 838f04e2
> 1f00: 82660a8c 7f1906b0 7f190708 00000000 40000006 7f137d18 8368578c 8393cd98
> 1f20: 8393cd80 00000000 00000000 00000000 00000000 00000000 82660a8c 80f04ec8
> 1f40: 8393cd80 82660580 82660a7c 7f1347f8 00000000 80f04ec8 00000001 82660a64
> 1f60: 00000000 eefad338 00000000 00000006 80be7f14 801246f8 00000006 80f03098
> 1f80: 80f03080 81504c80 00000101 8010140c f0861e78 80915818 8225e100 f0801f90
> 1fa0: 80f03080 80e543c0 80c059f4 0000000a 80e56a40 80e56a40 80e54334 80c284f4
> 1fc0: 00005a10 80f03d40 80a01e20 04208040 80c059f4 80e56a40 20000013 ffffffff
> 1fe0: f0861eb4 81504c80 81504c80 80f050b4 f0861e78 801245ac 80144024 804772fc
> kfree_skb_list_reason from ieee80211_report_used_skb+0xd0/0x5b4 [mac80211]
> ieee80211_report_used_skb [mac80211] from ieee80211_tx_status_ext+0x4c8/0x850 [mac80211]
> ieee80211_tx_status_ext [mac80211] from ieee80211_tx_status+0x74/0x9c [mac80211]
> ieee80211_tx_status [mac80211] from ieee80211_tasklet_handler+0xb0/0xd8 [mac80211]
> ieee80211_tasklet_handler [mac80211] from tasklet_action_common.constprop.0+0xb0/0xc4
> tasklet_action_common.constprop.0 from __do_softirq+0x14c/0x2c0
> __do_softirq from irq_exit+0x98/0xc8
> irq_exit from call_with_stack+0x18/0x20
> call_with_stack from __irq_svc+0x98/0xc8
> Exception stack(0xf0861e80 to 0xf0861ec8)
> 1e80: 00000001 00000002 00000001 81504c80 eefafdc0 00000000 81590880 00000000
> 1ea0: 81504c80 81505248 80f050b4 f0861f14 f0861f18 f0861ed0 80915bec 80144024
> 1ec0: 20000013 ffffffff
> __irq_svc from finish_task_switch+0xa8/0x270
> finish_task_switch from __schedule+0x25c/0x628
> __schedule from schedule+0x5c/0xb4
> schedule from smpboot_thread_fn+0xbc/0x23c
> smpboot_thread_fn from kthread+0xf4/0x124
> kthread from ret_from_fork+0x14/0x2c
> Exception stack(0xf0861fb0 to 0xf0861ff8)
> 1fa0:                                     00000000 00000000 00000000 00000000
> 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> Code: e92d4010 e2504000 08bd8010 e1a00004 (e5944000)  
> [  5]  24.00-25.00  sec   765 KBy---[ end trace 0000000000000000 ]---
> tes  6.27 Mbits/sec              Kernel panic - not syncing: Fatal exception in interrupt
> CPU2: stopping
> CPU: 2 PID: 0 Comm: swapper/2 Tainted: G      D            6.0.0-rc5-dnm3pv2+g047dc4cf9a10 #1
> Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> unwind_backtrace from show_stack+0x10/0x14
> show_stack from dump_stack_lvl+0x40/0x4c
> dump_stack_lvl from do_handle_IPI+0x100/0x128
> do_handle_IPI from ipi_handler+0x18/0x20
> ipi_handler from handle_percpu_devid_irq+0x8c/0x138
> handle_percpu_devid_irq from generic_handle_domain_irq+0x24/0x34
> generic_handle_domain_irq from gic_handle_irq+0x74/0x88
> gic_handle_irq from generic_handle_arch_irq+0x58/0x78
> generic_handle_arch_irq from call_with_stack+0x18/0x20
> call_with_stack from __irq_svc+0x98/0xc8
> Exception stack(0xf0871f10 to 0xf0871f58)
> 1f00:                                     00000002 80bf66e8 00000001 6e16f000
> 1f20: 00000000 80f0a668 00000000 00000000 a05c2adc a0629de7 eefc50c8 0000007b
> 1f40: fffffff5 f0871f60 80155d84 807006d8 60030013 ffffffff
> __irq_svc from cpuidle_enter_state+0x158/0x358
> cpuidle_enter_state from cpuidle_enter+0x40/0x50
> cpuidle_enter from do_idle+0x19c/0x208
> do_idle from cpu_startup_entry+0x18/0x1c
> cpu_startup_entry from secondary_start_kernel+0x148/0x150
> secondary_start_kernel from 0x10101620
> CPU3: stopping
> CPU: 3 PID: 0 Comm: swapper/3 Tainted: G      D            6.0.0-rc5-dnm3pv2+g047dc4cf9a10 #1
> Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> unwind_backtrace from show_stack+0x10/0x14
> show_stack from dump_stack_lvl+0x40/0x4c
> dump_stack_lvl from do_handle_IPI+0x100/0x128
> do_handle_IPI from ipi_handler+0x18/0x20
> ipi_handler from handle_percpu_devid_irq+0x8c/0x138
> handle_percpu_devid_irq from generic_handle_domain_irq+0x24/0x34
> generic_handle_domain_irq from gic_handle_irq+0x74/0x88
> gic_handle_irq from generic_handle_arch_irq+0x58/0x78
> generic_handle_arch_irq from call_with_stack+0x18/0x20
> call_with_stack from __irq_svc+0x98/0xc8
> Exception stack(0xf0875f10 to 0xf0875f58)
> 5f00:                                     00000003 80bf66e8 00000001 6e17a000
> 5f20: 00000000 80f0a668 00000000 00000000 a05c5ef1 a0629de7 eefd00c8 0000007b
> 5f40: fffffff5 f0875f60 80155d84 807006d8 60000013 ffffffff
> __irq_svc from cpuidle_enter_state+0x158/0x358
> cpuidle_enter_state from cpuidle_enter+0x40/0x50
> cpuidle_enter from do_idle+0x19c/0x208
> do_idle from cpu_startup_entry+0x18/0x1c
> cpu_startup_entry from secondary_start_kernel+0x148/0x150
> secondary_start_kernel from 0x10101620
> CPU1: stopping
> CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D            6.0.0-rc5-dnm3pv2+g047dc4cf9a10 #1
> Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> unwind_backtrace from show_stack+0x10/0x14
> show_stack from dump_stack_lvl+0x40/0x4c
> dump_stack_lvl from do_handle_IPI+0x100/0x128
> do_handle_IPI from ipi_handler+0x18/0x20
> ipi_handler from handle_percpu_devid_irq+0x8c/0x138
> handle_percpu_devid_irq from generic_handle_domain_irq+0x24/0x34
> generic_handle_domain_irq from gic_handle_irq+0x74/0x88
> gic_handle_irq from generic_handle_arch_irq+0x58/0x78
> generic_handle_arch_irq from call_with_stack+0x18/0x20
> call_with_stack from __irq_svc+0x98/0xc8
> Exception stack(0xf086df10 to 0xf086df58)
> df00:                                     00000001 80bf66e8 00000001 6e164000
> df20: 00000000 80f0a668 00000000 00000000 a05c2d77 a0629de7 eefba0c8 0000007b
> df40: fffffff5 f086df60 80155d84 807006d8 600e0013 ffffffff
> __irq_svc from cpuidle_enter_state+0x158/0x358
> cpuidle_enter_state from cpuidle_enter+0x40/0x50
> cpuidle_enter from do_idle+0x19c/0x208
> do_idle from cpu_startup_entry+0x18/0x1c
> cpu_startup_entry from secondary_start_kernel+0x148/0x150
> secondary_start_kernel from 0x10101620
>
> However, the corruption can manifest itself in different ways as well -
> - sometimes even damaging contents of onboard NAND flash.
> Similar traces have appeared previously in other places as well.
> In addition to testing on 6.0-rc5, we tried cherry-picking 047dc4cf9a10b4f2dc164b8bf192de583f3ebfee
> from wireless-next as well, but this seems unrelated to the issue on first glance,
> and doesn't prevent crashes.
>
> I post relevant bits of device tree we used to get the module to work below.
> We're using in-band IRQ of the SDIO interface:
>
> / {
>          wfx_pwrseq: wfx_pwrseq {
>                  compatible = "mmc-pwrseq-simple";
>                  pinctrl-names = "default";
>                  pinctrl-0 = <&pinctrl_wfx_reset>;
>                  reset-gpios = <&gpio7 8 GPIO_ACTIVE_LOW>;
>          };
>  };
>
> &iomuxc {
>          usdhc1 {
>                  pinctrl_usdhc1_3: usdhc1grp-3 {
>                          fsl,pins = <
>                                  MX6QDL_PAD_SD1_CMD__SD1_CMD    0x17059
>                                  MX6QDL_PAD_SD1_CLK__SD1_CLK    0x10059
>                                  MX6QDL_PAD_SD1_DAT0__SD1_DATA0 0x17059
>                                  MX6QDL_PAD_SD1_DAT1__SD1_DATA1 0x17059
>                                  MX6QDL_PAD_SD1_DAT2__SD1_DATA2 0x17059
>                                  MX6QDL_PAD_SD1_DAT3__SD1_DATA3 0x17059
>                                  MX6QDL_PAD_SD3_CLK__GPIO7_IO03 0x17041
>                                  MX6QDL_PAD_SD3_CMD__GPIO7_IO02 0x13019
>                          >;
>                  };
>
>                  pinctrl_wfx_reset: wfx-reset-grp {
>                          fsl,pins = <
>                                  MX6QDL_PAD_SD3_RST__GPIO7_IO08 0x1B030
>                          >;
>                  };
>          };
> };
>
> &usdhc1 {
>          status = "okay";
>          #address-cells = <1>;
>          #size-cells = <0>;
>          pinctrl-names = "default";
>          pinctrl-0 = <&pinctrl_usdhc1_3>;
>          cap-power-off-card;
>          keep-power-in-suspend;
>          cap-sdio-irq;
>          wakeup-source;
>          disable-wp;
>          cap-sd-highspeed;
>          bus-width = <4>;
>          non-removable;
>          no-mmc;
>          no-sd;
>          mmc-pwrseq = <&wfx_pwrseq>;
>          wifi@1 {
>                  compatible = "silabs,brd8023a";
>                  reg = <1>;
>                  wakeup-gpios = <&gpio7 2 GPIO_ACTIVE_HIGH>;
>          };
> };
>
> With that, the device probes successfully, and we can get 22Mbps of traffic with a 1T1R peer
> in HT20 mode in both directions.
> SDIO singals were checked with the oscilloscope, and they look perfectly fine,
> so I think we can rule out any hardware issue.
>
> By adding a canary to slab allocator, we managed to find, that the skb structures gets damaged,
> and then improperly dereferenced by the driver somewhere in TX queue handling code.
>
> With SMP disabled, the issue still manifests itself, hinting at synchronization issue
> between the interrupt context, and the tasklets handling the bulk of work.
> However, it usually takes a longer time to reproduce - still in order of a few minutes.
> In some cases the kernel would detect use-after-free by itself - without modification -
> or the reference counts get corrupted.
>
> This stacktrace comes from one of the runs with CONFIG_SMP disabled:
>
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 10 at lib/refcount.c:28 ieee80211_tx_status_ext+0x4f8/0x968 [mac80211]
> refcount_t: underflow; use-after-free.
> Modules linked in: xt_LOG nf_log_syslog xt_limit iptable_mangle xt_connmark xt_tcpudp xt_conntrack
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables cdc_mbim cdc_wdm cdc_ncm
> cdc_ether usbnet cdc_acm usb_serial_simple usbserial usb_f_rndis u_ether wfx mac80211 libarc4 evbug
> phy_generic cfg80211 adt7475 hwmon_vid ci_hdrc_imx ci_hdrc ulpi roles usbmisc_imx pwm_imx27
> pwm_beeper libcomposite configfs udc_core
> CPU: 0 PID: 10 Comm: ksoftirqd/0 Tainted: G        W         5.19.2+ge4fb6643395f #1
> Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> unwind_backtrace from show_stack+0x10/0x14
> show_stack from dump_stack_lvl+0x24/0x2c
> dump_stack_lvl from __warn+0xb0/0xd8
> __warn from warn_slowpath_fmt+0x98/0xc8
> warn_slowpath_fmt from ieee80211_tx_status_ext+0x4f8/0x968 [mac80211]
> ieee80211_tx_status_ext [mac80211] from ieee80211_tx_status+0x74/0x9c [mac80211]
> ieee80211_tx_status [mac80211] from ieee80211_tasklet_handler+0xb0/0xd8 [mac80211]
> ieee80211_tasklet_handler [mac80211] from tasklet_action_common.constprop.0+0xb4/0xc0
> tasklet_action_common.constprop.0 from __do_softirq+0x12c/0x290
> __do_softirq from irq_exit+0x90/0xbc
> irq_exit from call_with_stack+0x18/0x20
> call_with_stack from __irq_svc+0x94/0xc4
> Exception stack(0xf0859e98 to 0xf0859ee0)
> 9e80:                                                       00000001 81080780
> 9ea0: 00000001 81080780 00000000 00000002 822f0780 808e82cc 81080780 81080c50
> 9ec0: 00000000 f0859f14 f0859f18 f0859ee8 801404f0 80140624 20000013 ffffffff
> __irq_svc from finish_task_switch+0x78/0x1f8
> finish_task_switch from __schedule+0x244/0x580
> __schedule from schedule+0x5c/0xb4
> schedule from smpboot_thread_fn+0xb8/0x224
> smpboot_thread_fn from kthread+0xe4/0x114
> kthread from ret_from_fork+0x14/0x2c
> Exception stack(0xf0859fb0 to 0xf0859ff8)
> 9fa0:                                     00000000 00000000 00000000 00000000
> 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> ---[ end trace 0000000000000000 ]---
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 1131 at lib/refcount.c:22 __tcp_transmit_skb+0x7a4/0xa8c
>     
> refcount_t: saturated; leaking memory.
> Modules linked in: xt_LOG nf_log_syslog xt_limit iptable_mangle xt_connmark xt_tcpudp xt_conntrack
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables cdc_mbim cdc_wdm cdc_ncm
> cdc_ether usbnet cdc_acm usb_serial_simple usbserial usb_f_rndis u_ether wfx mac80211 libarc4 evbug
> phy_generic cfg80211 adt7475 hwmon_vid ci_hdrc_imx ci_hdrc ulpi roles usbmisc_imx pwm_imx27
> pwm_beeper libcomposite configfs udc_core
> CPU: 0 PID: 1131 Comm: kworker/0:2H Tainted: G        W         5.19.2+ge4fb6643395f #1
> Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> Workqueue: wfx_bh_wq bh_work [wfx]
> unwind_backtrace from show_stack+0x10/0x14
> show_stack from dump_stack_lvl+0x24/0x2c
> dump_stack_lvl from __warn+0xb0/0xd8
> __warn from warn_slowpath_fmt+0x98/0xc8
> warn_slowpath_fmt from __tcp_transmit_skb+0x7a4/0xa8c
> __tcp_transmit_skb from __tcp_send_ack.part.0+0xd0/0x13c
> __tcp_send_ack.part.0 from tcp_delack_timer_handler+0xb0/0x180
> tcp_delack_timer_handler from tcp_delack_timer+0x2c/0x128
> tcp_delack_timer from call_timer_fn.constprop.0+0x18/0x80
> call_timer_fn.constprop.0 from run_timer_softirq+0x2ec/0x3b0
> run_timer_softirq from __do_softirq+0x12c/0x290
> __do_softirq from call_with_stack+0x18/0x20
> call_with_stack from do_softirq+0x6c/0x70
> do_softirq from __local_bh_enable_ip+0xd8/0xdc
> __local_bh_enable_ip from __netdev_alloc_skb+0x14c/0x170
> __netdev_alloc_skb from bh_work+0x1b0/0x650 [wfx]
> bh_work [wfx] from process_one_work+0x1b8/0x3ec
> process_one_work from worker_thread+0x4c/0x57c
> worker_thread from kthread+0xe4/0x114
> kthread from ret_from_fork+0x14/0x2c
> Exception stack(0xf161dfb0 to 0xf161dff8)
> dfa0:                                     00000000 00000000 00000000 00000000
> dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> ---[ end trace 0000000000000000 ]---
> [  5] 536.16-537.00 sec  26.9 KBytes   261 Kbits/sec                   
> [  5] 537.00-538.00 sec  2.71 MBytes  22.7 Mbits/sec                   
> 8<--- cut here ---
> Unable to handle kernel NULL pointer dereference at virtual address 0000011c
> [0000011c] *pgd=00000000
> Internal error: Oops: 5 [#1] PREEMPT ARM
> Modules linked in: xt_LOG nf_log_syslog xt_limit iptable_mangle xt_connmark xt_tcpudp
> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables
> cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet cdc_acm usb_serial_simple usbserial
> usb_f_rndis u_ether wfx mac80211 libarc4 evbug phy_generic cfg80211 adt7475
> hwmon_vid ci_hdrc_imx ci_hdrc ulpi roles usbmisc_imx pwm_imx27 pwm_beeper
> libcomposite configfs udc_core
> CPU: 0 PID: 10 Comm: ksoftirqd/0 Tainted: G        W         5.19.2+ge4fb6643395f #1
> Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> PC is at ip6_rcv_core+0x110/0x68c
> LR is at ip6_rcv_core+0xb0/0x68c
> pc : [<8084d278>]    lr : [<8084d218>]    psr: 20000013
> sp : f0859e18  ip : 00000000  fp : 80e13cc0
> r10: 00000000  r9 : 80e13cf4  r8 : 81b65000
> r7 : 80e6d7c8  r6 : 82024c00  r5 : 812a8760  r4 : 81be5b40
> r3 : 00000000  r2 : 00000100  r1 : 000000d7  r0 : 00000000
> Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
> Control: 10c53c7d  Table: 12338059  DAC: 00000051
> Register r0 information: NULL pointer
> Register r1 information: non-paged memory
> Register r2 information: non-paged memory
> Register r3 information: NULL pointer
> Register r4 information: slab skbuff_head_cache start 81be5b40 pointer offset 0 size 48
> Register r5 information: non-slab/vmalloc memory
> Register r6 information: slab kmalloc-1k start 82024c00 pointer offset 0 size 1024
> Register r7 information: non-slab/vmalloc memory
> Register r8 information: slab kmalloc-2k start 81b65000 pointer offset 0 size 2048
> Register r9 information: non-slab/vmalloc memory
> Register r10 information: NULL pointer
> Register r11 information: non-slab/vmalloc memory
> Register r12 information: NULL pointer
> Process ksoftirqd/0 (pid: 10, stack limit = 0x7cac7060)
> Stack: (0xf0859e18 to 0xf085a000)
> 9e00:                                                       81b65000 80e13d00
> 9e20: 80e6d7c8 80e13cc8 00000040 80e13cf4 00000000 8084da90 80d0ce80 80d0424c
> 9e40: 80d0ce80 81b65000 80e13d00 00000001 80e13cc8 80d0424c 8084da60 80e13d00
> 9e60: 00000001 807691c0 00000001 81be5b40 80d06654 80d0424c 81be5b40 80769348
> 9e80: 00000001 80e13d00 00000040 f0859ecb 80dd6000 00008b6a f0859ed4 80769ec4
> 9ea0: 00000001 81080780 00000000 80e13d00 0000012c 00000000 f0859ecc 8076a2d8
> 9ec0: 00008b6c 81080780 00859f18 f0859ecc f0859ecc f0859ed4 f0859ed4 80d0424c
> 9ee0: 00000051 00000000 00000003 80e15834 80e15828 81080780 00000100 80adb4e4
> 9f00: 40000003 801013f4 821d9540 00000000 f0859f5c 80e15828 80d0d390 80e13c80
> 9f20: 80af6e3c 0000000a 80d0b588 80b19518 00008b6b 80dd6000 04208040 80901dd0
> 9f40: 81080780 00000000 8102de00 81080780 80d0b558 00000001 00000001 00000000
> 9f60: 00000000 80120a18 00000000 8013e590 8102de40 8102df00 8013e42c 8102de00
> 9f80: 81080780 f0835e30 00000000 8013a85c 8102de40 8013a778 00000000 00000000
> 9fa0: 00000000 00000000 00000000 80100148 00000000 00000000 00000000 00000000
> 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
> ip6_rcv_core from ipv6_rcv+0x30/0xd4
> ipv6_rcv from __netif_receive_skb_one_core+0x5c/0x80
> __netif_receive_skb_one_core from process_backlog+0x70/0xe4
> process_backlog from __napi_poll+0x2c/0x1f0
> __napi_poll from net_rx_action+0x140/0x264
> net_rx_action from __do_softirq+0x12c/0x290
> __do_softirq from run_ksoftirqd+0x34/0x3c
> run_ksoftirqd from smpboot_thread_fn+0x164/0x224
> smpboot_thread_fn from kthread+0xe4/0x114
> kthread from ret_from_fork+0x14/0x2c
> Exception stack(0xf0859fb0 to 0xf0859ff8)
> 9fa0:                                     00000000 00000000 00000000 00000000
> 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> Code: e5843024 e5843028 e584302c 0a000055 (e1d231bc)  
> ---[ end trace 0000000000000000 ]---
> Kernel panic - not syncing: Fatal exception in interrupt
>
> Now, the questions:
> - Is "silabs,brd8023a" the proper compatible string for WFM200S022XNN3, or should we create our
>   own for the bare module, even if just the in-band SDIO IRQ, and an external antenna is in use?
> - In order to try out the out-of-band IRQ - in hope that it resolves the issue somehow - do we need to create custom PDS file?
>   With the IRQ enabled, probe fails with "Chip did not answer" error.
> - Tracing memory corruptions is hard - is there a mechanism that could help us out better than generic methods like kprobes,
>   or implementing canaries? As skb's are heavily re-used for performance reasons, tracing their lifecycle is especially hard.
>   Our first idea was to lock their respective pages from writing, once they are enqueued in the wfx TX queue,
>   so MMU detects the corruption at the exact time it happens, but we haven't figure out how to modify skb allocator to accomplish that,
>   especially given that the issue mostly happens when transmitting, so skbs are allocated outside of the driver.
>   Maybe there exists a similar mechanism - that could help us out - even if just in the works?
>
> Any help will be greatly appreciated - we'll be very happy to provide a patch if we manage to figure the issue out.
>
>
> W dniu 12.09.2022 o 18:15, Jérôme Pouiller pisze:
>> On Monday 12 September 2022 17:16:24 CEST Lech Perczak wrote:
>>> Hello,
>>>
>>> We're trying to get a WFM200S022XNN3 module working on a custom i.MX6Q board using SDIO interface, using upstream kernel. Our patches concern primarily the device tree for the board - and upstream firmware from linux-firmware repository.
>>>
>>> During that, we stumbled upon a memory corruption issue, which appears when big traffic is passing through the device. Our adapter is running in AP mode. This can be reproduced with 100% rate using iperf3, by starting an AP interface on the device, and an iperf3 server. Then, the client station runs iperf3 with "iperf3 -c <hostname> -t 3600" command - so the AP is sending data for up to one hour, however - the kernel on our device crashes after around a few minutes of traffic, sometimes less than a minute.
>>>
>>> The behaviour is the same on kernel v5.19.7, v5.19.2, and even with v6.0-rc5. Tests on v6.0-rc5 have shown most detailed stacktrace so far:
>>>
>> Hello Lech,
>>
>> It seems that something somewhere (Ms Exchange, I am looking at you) has
>> removed all the newlines of your mail :-/. Can you try to fix the problem?
>> I think that sending mails using base64 encoding would solve the issue.
>>
>>
>> [...]
>>
>> --
>> Jérôme Pouiller

-- 
Pozdrawiam/With kind regards,
Lech Perczak

Sr. Software Engineer
Camlin Technologies Poland Limited Sp. z o.o.
Strzegomska 54,
53-611 Wroclaw
Tel:     (+48) 71 75 000 16
Email:   lech.perczak@...lingroup.com
Website: http://www.camlingroup.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ