lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 21 Jan 2024 21:05:09 +0000
From: Peter Waller <p@...ller.net>
To: Igor Russkikh <irusskikh@...vell.com>, Jakub Kicinski <kuba@...nel.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
 Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
 Netdev <netdev@...r.kernel.org>
Subject: Re: [EXT] Aquantia ethernet driver suspend/resume issues

I see a fix for double free [0] landed in 6.7; I've been running that 
for a few days and have hit a resume from suspend issue twice. Stack 
trace looks a little different (via __iommu_dma_map instead of 
__iommu_dma_free), provided below.

I've had resume issues with the atlantic driver since I've had this 
hardware, but it went away for a while and seems as though it may have 
come back with 6.7. (No crashes since logs begin on Dec 15 till Jan 12, 
Upgrade to 6.7; crashes 20th and 21st, though my usage style of the 
system has also varied, maybe crashes are associated with higher memory 
usage?).

Possibly unrelated but I also see fairly frequent (1 to ten times per 
boot, since logs begin?) messages in my logs of the form "atlantic 
0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014 
address=0xffce8000 flags=0x0020]".

[0] 
https://github.com/torvalds/linux/commit/7bb26ea74aa86fdf894b7dbd8c5712c5b4187da7

- Peter

kworker/u65:2: page allocation failure: order:6, 
mode:0x40d00(GFP_NOIO|__GFP_COMP|__GFP_ZERO), 
nodemask=(null),cpuset=/,mems_allowed=0
CPU: 18 PID: 166017 Comm: kworker/u65:2 Not tainted 6.7.0
Hardware name: ASUS System Product [...] BIOS 1502 06/08/2023
Workqueue: events_unbound async_run_entry_fn
Call Trace:
  <TASK>
  dump_stack_lvl+0x47/0x60
  warn_alloc+0x165/0x1e0
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? __alloc_pages_direct_compact+0xb3/0x290
  __alloc_pages+0x109e/0x1130
  ? iommu_dma_alloc_iova+0xd4/0x120
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? __iommu_dma_map+0x84/0xf0
  ? aq_ring_alloc+0x22/0x80 [atlantic]
  __kmalloc_large_node+0x77/0x130
  __kmalloc+0xc6/0x150
  aq_ring_alloc+0x22/0x80 [atlantic]
  aq_vec_ring_alloc+0xee/0x1a0 [atlantic]
  aq_nic_init+0x118/0x1d0 [atlantic]
  atl_resume_common+0x40/0xd0 [atlantic]


On 30/11/2023 12:59, Igor Russkikh wrote:
>
> On 11/28/2023 10:09 PM, Jakub Kicinski wrote:
>> For Rx under load larger rings are sometimes useful to avoid drops.
>> But your Tx rings are larger than Rx, which is a bit odd.
> Agree. Just looked into the history, and it looks like this size was chosen
> since the very first commit of this driver.
>
>> I was going to say that with BQL enabled you're very unlikely to ever
>> use much of the 4k Tx ring, anyway. But you don't have BQL support :S
>>
>> My free advice is to recheck you really need these sizes and implement
>> BQL :)
> Thanks for the hint, will consider this.
>
> Regards
>    Igor



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ