lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:   Sat, 25 Sep 2021 14:32:16 +0300
From:   k@...ka.home.kg
To:     Felix Fietkau <nbd@...nwrt.org>, John Crispin <john@...ozen.org>,
        Sean Wang <sean.wang@...iatek.com>,
        Mark Lee <Mark-MC.Lee@...iatek.com>,
        Matthias Brugger <matthias.bgg@...il.com>,
        netdev@...r.kernel.org
Subject: [MEDIATEK ETHERNET DRIVER] initialization failure on low ram systems

Hi !

I'm using openwrt 21.02 kernel 5.4.143 on Tp-link c6u device
It's MT7621DAT based board with 128 MB RAM :
https://openwrt.org/toh/hwdata/tp-link/tp-link_archer_c6u_v1_eu
https://wikidevi.wi-cat.ru/MediaTek_MT7621

I found that sometimes during network restart when mediatek chip gets reinitialized
kernel memory allocation fails and switch ports become unusable (some or all)
leading to loss of ethernet access to the router
ethtool reports no link

Here is the kernel log :

[10389.945893] netifd: page allocation failure: order:8, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
[10389.958689] CPU: 1 PID: 20444 Comm: netifd Not tainted 5.4.143 #0
[10389.964763] Stack : 00000008 80082090 00000000 00000000 80730000 80738c9c 80737960 86815b7c
[10389.973104]         808f0000 80784da3 806b7a74 806b7a74 00000001 00000001 86815b20 00000007
[10389.981438]         00000000 00000000 80930000 00000000 30232033 000018a1 2e352064 34312e34
[10389.989771]         00000000 00000204 00000000 000ea0e1 80000000 807a0000 00000000 00040dc0
[10389.998104]         807a0000 00000201 00000240 00040dc0 00000000 80381e08 00000004 808f0004
[10390.006439]         ...
[10390.008879] Call Trace:
[10390.011344] [<8000b68c>] show_stack+0x30/0x100
[10390.015800] [<805f1254>] dump_stack+0xa4/0xdc
[10390.020167] [<80170c90>] warn_alloc+0xc0/0x138
[10390.024602] [<80171af4>] __alloc_pages_nodemask+0xdec/0xeb8
[10390.030161] [<8014bfb8>] kmalloc_order+0x2c/0x70
[10390.034778] [<8040807c>] mtk_open+0x158/0x804
[10390.039127] [<8045d5e4>] __dev_open+0xf4/0x188
[10390.043559] [<8045da44>] __dev_change_flags+0x18c/0x1e4
[10390.048768] [<8045dac4>] dev_change_flags+0x28/0x70
[10390.053637] [<8048a364>] dev_ifsioc+0x2ac/0x34c
[10390.058155] [<8048a5f0>] dev_ioctl+0xd4/0x3f8
[10390.062510] [<804304ec>] sock_ioctl+0x354/0x4bc
[10390.067040] [<801adbb4>] do_vfs_ioctl+0xb8/0x7c0
[10390.071645] [<801ae30c>] ksys_ioctl+0x50/0xb4
[10390.076000] [<80014598>] syscall_common+0x34/0x58
[10390.080969] Mem-Info:
[10390.083314] active_anon:6858 inactive_anon:6892 isolated_anon:32
[10390.083314]  active_file:733 inactive_file:741 isolated_file:1
[10390.083314]  unevictable:2 dirty:0 writeback:0 unstable:0
[10390.083314]  slab_reclaimable:921 slab_unreclaimable:5251
[10390.083314]  mapped:1082 shmem:0 pagetables:215 bounce:0
[10390.083314]  free:3817 free_pcp:32 free_cma:0
[10390.115576] Node 0 active_anon:27432kB inactive_anon:27848kB active_file:2988kB inactive_file:3468kB unevictable:8kB isolated(anon):128kB isolated(file):4kB mapped:4776kB dirty:0kB writeback:0kB shmem:0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[10390.138417] Normal free:13876kB min:13312kB low:14336kB high:15360kB active_anon:27432kB inactive_anon:27760kB active_file:2808kB inactive_file:3508kB unevictable:8kB writepending:0kB present:131072kB managed:121444kB mlocked:8kB kernel_stack:1064kB pagetables:860kB bounce:0kB free_pcp:76kB local_pcp:0kB free_cma:0kB


I traced where alloc fails and found 1 MB memory allocation 

mtk_eth_soc.c  mtk_open()
err = mtk_start_dma(eth); // err=-ENOMEM
err = mtk_dma_init(eth); // err=-ENOMEM
err = mtk_init_fq_dma(eth); // err=-ENOMEM
eth->scratch_head = kcalloc(cnt, MTK_QDMA_PAGE_SIZE, GFP_KERNEL); // cnt=512, MT_QDMA_PAGE_SIZE=2048. allocating 1 MB, *FAILS HERE*


To reproduce I put the system to memory pressure condition using
screen nice -n 10 stress --vm 1 --vm-bytes 71000000
and then restart the chip : /etc/init.d/network restart
Even after killing stress network restart often does not help to restore LAN access

From my point of view it's not a good idea to allocate large contiguous memory pieces in linux kernel
on low RAM systems which most of the routers are. Free kernel memory may become fragmented.
I tried decreaseing MTK_DMA_SIZE from 512 to 128 and it helped but not 100%
MTK_DMA_SIZE=64 makes network unstable

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ