lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f1673f31-b1b4-2c50-92ff-c6b5e247586f@marvell.com>
Date: Tue, 28 Nov 2023 18:54:28 +0100
From: Igor Russkikh <irusskikh@...vell.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
CC: Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>, Netdev <netdev@...r.kernel.org>
Subject: Re: [EXT] Aquantia ethernet driver suspend/resume issues


On 11/27/2023 7:02 PM, Linus Torvalds wrote:

> 
> So I suspect that one reason I triggered the problem was simply
> because the suspend/resume happened while I walked away from the
> computer when it was copying a few hundred gig of data from the old
> SSD (over USB, so not hugely fast).

...

> Also, make sure you don't have ridiculous amounts of memory in your
> machine.  I've got "only" 64GB in mine, which is small for a big
> machine, and presumably a lot of it was used for buffer cache, and I'm
> not sure what the device suspend/resume ordering was (ie disk might be
> resumed after ethernet).

With these details in mind I was able to repro this within seconds on my 16Gb machine,
basically by doing a stress in parallel:

    stress -m 2000 --vm-bytes 20M --vm-hang 10 --backoff 1000

    while true; do sudo ifconfig enp1s0 down; sudo ifconfig enp1s0 up; done

in 5-10 seconds I get

[  859.536856] atlantic 0000:01:00.0 enp1s0: aq_ring_alloc[6](0x30000)
[  859.563153] warn_alloc: 1 callbacks suppressed
[  859.563156] ifconfig: page allocation failure: order:5, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
[  859.563163] CPU: 13 PID: 48544 Comm: ifconfig Tainted: G           OE      6.7.0-rc2igor1+ #1
[  859.563165] Hardware name: ASUS System Product Name/PRIME Z590-P, BIOS 1017 07/12/2021
[  859.563166] Call Trace:
[  859.563168]  <TASK>
[  859.563170]  dump_stack_lvl+0x48/0x70
[  859.563175]  dump_stack+0x10/0x20
[  859.563177]  warn_alloc+0x119/0x190
[  859.563180]  ? __alloc_pages_direct_compact+0xae/0x1f0
[  859.563183]  __alloc_pages_slowpath.constprop.0+0xd1a/0xdd0
[  859.563188]  __alloc_pages+0x304/0x350
[  859.563192]  ? aq_ring_alloc+0x29/0xe0 [atlantic]
[  859.563207]  __kmalloc_large_node+0x7f/0x140
[  859.563210]  __kmalloc+0xc9/0x140
[  859.563212]  aq_ring_alloc+0x29/0xe0 [atlantic]
[  859.563221]  aq_ring_rx_alloc+0x7d/0x90 [atlantic]
[  859.563230]  aq_vec_ring_alloc+0xab/0x170 [atlantic]
[  859.563241]  aq_nic_init+0x11c/0x1e0 [atlantic]
[  859.563250]  aq_ndev_open+0x20/0x90 [atlantic]
[  859.563259]  __dev_open+0xe9/0x190
[  859.563261]  __dev_change_flags+0x18c/0x1f0
[  859.563263]  dev_change_flags+0x26/0x70
[  859.563265]  devinet_ioctl+0x602/0x760
[  859.563268]  inet_ioctl+0x167/0x190
[  859.563269]  ? sk_ioctl+0x4b/0x110
[  859.563271]  ? inet_ioctl+0x95/0x190
[  859.563273]  sock_do_ioctl+0x44/0xf0
[  859.563274]  ? __check_object_size+0x51/0x2d0
[  859.563277]  ? _copy_to_user+0x25/0x40
[  859.563279]  sock_ioctl+0xf7/0x300
[  859.563280]  __x64_sys_ioctl+0x95/0xd0
[  859.563283]  do_syscall_64+0x5c/0xe0
[  859.563286]  ? exit_to_user_mode_prepare+0x45/0x1a0
[  859.563289]  ? syscall_exit_to_user_mode+0x34/0x50
[  859.563291]  ? do_syscall_64+0x6b/0xe0
[  859.563293]  ? do_syscall_64+0x6b/0xe0
[  859.563295]  ? syscall_exit_to_user_mode+0x34/0x50
[  859.563296]  ? __x64_sys_openat+0x20/0x30
[  859.563298]  ? do_syscall_64+0x6b/0xe0
[  859.563300]  ? syscall_exit_to_user_mode+0x34/0x50
[  859.563301]  ? __x64_sys_read+0x1a/0x20
[  859.563303]  ? do_syscall_64+0x6b/0xe0
[  859.563305]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[  859.563307] RIP: 0033:0x7f98499df3ab
[  859.563309] Code: 0f 1e fa 48 8b 05 e5 7a 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b5 7a 0d 00 f7 d8 64 89 01 48
[  859.563310] RSP: 002b:00007fffba955138 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[  859.563312] RAX: ffffffffffffffda RBX: 00007fffba955140 RCX: 00007f98499df3ab
[  859.563313] RDX: 00007fffba955140 RSI: 0000000000008914 RDI: 0000000000000004
[  859.563314] RBP: 00007fffba9551f0 R08: 0000000000000008 R09: 0000000000000001
[  859.563315] R10: 0000000000000011 R11: 0000000000000202 R12: 0000000000000041
[  859.563316] R13: 00007fffba9554e8 R14: 0000000000000000 R15: 0000000000000000
[  859.563318]  </TASK>
[  859.563319] Mem-Info:
[  859.563320] active_anon:14091 inactive_anon:3805083 isolated_anon:2336
                active_file:4601 inactive_file:5452 isolated_file:3
                unevictable:2258 dirty:56 writeback:0
                slab_reclaimable:35879 slab_unreclaimable:42730
                mapped:8485 shmem:2635 pagetables:35066
                sec_pagetables:0 bounce:0
                kernel_misc_reclaimable:0
                free:56673 free_pcp:0 free_cma:0
[  859.563323] Node 0 active_anon:56364kB inactive_anon:15220332kB active_file:18404kB inactive_file:21808kB unevictable:9032kB isolated(anon):9344kB isolated(file):12kB mapped:33940kB dirty:224kB writeback:0kB shmem:10540kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0kB kernel_stack:42592kB pagetables:140264kB sec_pagetables:0kB all_unreclaimable? no
[  859.563326] Node 0 DMA free:13308kB boost:0kB min:64kB low:80kB high:96kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[  859.563329] lowmem_reserve[]: 0 2305 15744 15744 15744
[  859.563332] Node 0 DMA32 free:63608kB boost:0kB min:9884kB low:12352kB high:14820kB reserved_highatomic:0KB active_anon:16kB inactive_anon:2330560kB active_file:88kB inactive_file:36kB unevictable:0kB writepending:8kB present:2467796kB managed:2401988kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[  859.563335] lowmem_reserve[]: 0 0 13439 13439 13439
[  859.563338] Node 0 Normal free:149776kB boost:90112kB min:147740kB low:162144kB high:176548kB reserved_highatomic:2048KB active_anon:56024kB inactive_anon:12889984kB active_file:19608kB inactive_file:22216kB unevictable:9032kB writepending:0kB present:14098432kB managed:13769880kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[  859.563342] lowmem_reserve[]: 0 0 0 0 0
[  859.563344] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 1*512kB (U) 0*1024kB 2*2048kB (UM) 2*4096kB (M) = 13308kB
[  859.563353] Node 0 DMA32: 2*4kB (UM) 2*8kB (UM) 5*16kB (M) 5*32kB (UM) 6*64kB (UM) 17*128kB (UM) 22*256kB (UM) 4*512kB (UM) 6*1024kB (UM) 11*2048kB (UM) 6*4096kB (UM) = 63752kB
[  859.563362] Node 0 Normal: 7073*4kB (UMEH) 1922*8kB (UMEH) 756*16kB (UMEH) 464*32kB (UMEH) 830*64kB (UMEH) 197*128kB (UMH) 6*256kB (MH) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 150484kB
[  859.563371] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[  859.563373] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[  859.563374] 23673 total pagecache pages
[  859.563374] 11582 pages in swap cache
[  859.563375] Free swap  = 7137276kB
[  859.563375] Total swap = 8003580kB
[  859.563376] 4145555 pages RAM
[  859.563376] 0 pages HighMem/MovableOnly
[  859.563377] 98748 pages reserved
[  859.563378] 0 pages hwpoisoned
[  859.563379] atlantic 0000:01:00.0 enp1s0: aq_ring_alloc[6](0x18000)
[  859.563381] atlantic 0000:01:00.0 enp1s0: aq_ring_alloc[6] FAILURE =============================
[  859.563388] atlantic 0000:01:00.0 enp1s0: device open failure
[  860.996946] atlantic 0000:01:00.0 enp1s0: aq_ring_alloc[0](0x30000)
[  860.996961] atlantic 0000:01:00.0 enp1s0: aq_ring_alloc[0](0x18000)

Thats already with the patch applied, so no panic and next "ifconfig up" recovers the device state.

I will submit a bugfix patch for that solution, but will also continue looking into suspend/resume refactoring.

Thanks,
  Igor

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ