[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e98f7617-b0fe-4d2a-be68-f41fb371ba36@pwaller.net>
Date: Sun, 21 Jan 2024 21:05:09 +0000
From: Peter Waller <p@...ller.net>
To: Igor Russkikh <irusskikh@...vell.com>, Jakub Kicinski <kuba@...nel.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
Netdev <netdev@...r.kernel.org>
Subject: Re: [EXT] Aquantia ethernet driver suspend/resume issues
I see a fix for double free [0] landed in 6.7; I've been running that
for a few days and have hit a resume from suspend issue twice. Stack
trace looks a little different (via __iommu_dma_map instead of
__iommu_dma_free), provided below.
I've had resume issues with the atlantic driver since I've had this
hardware, but it went away for a while and seems as though it may have
come back with 6.7. (No crashes since logs begin on Dec 15 till Jan 12,
Upgrade to 6.7; crashes 20th and 21st, though my usage style of the
system has also varied, maybe crashes are associated with higher memory
usage?).
Possibly unrelated but I also see fairly frequent (1 to ten times per
boot, since logs begin?) messages in my logs of the form "atlantic
0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014
address=0xffce8000 flags=0x0020]".
[0]
https://github.com/torvalds/linux/commit/7bb26ea74aa86fdf894b7dbd8c5712c5b4187da7
- Peter
kworker/u65:2: page allocation failure: order:6,
mode:0x40d00(GFP_NOIO|__GFP_COMP|__GFP_ZERO),
nodemask=(null),cpuset=/,mems_allowed=0
CPU: 18 PID: 166017 Comm: kworker/u65:2 Not tainted 6.7.0
Hardware name: ASUS System Product [...] BIOS 1502 06/08/2023
Workqueue: events_unbound async_run_entry_fn
Call Trace:
<TASK>
dump_stack_lvl+0x47/0x60
warn_alloc+0x165/0x1e0
? srso_alias_return_thunk+0x5/0xfbef5
? __alloc_pages_direct_compact+0xb3/0x290
__alloc_pages+0x109e/0x1130
? iommu_dma_alloc_iova+0xd4/0x120
? srso_alias_return_thunk+0x5/0xfbef5
? __iommu_dma_map+0x84/0xf0
? aq_ring_alloc+0x22/0x80 [atlantic]
__kmalloc_large_node+0x77/0x130
__kmalloc+0xc6/0x150
aq_ring_alloc+0x22/0x80 [atlantic]
aq_vec_ring_alloc+0xee/0x1a0 [atlantic]
aq_nic_init+0x118/0x1d0 [atlantic]
atl_resume_common+0x40/0xd0 [atlantic]
On 30/11/2023 12:59, Igor Russkikh wrote:
>
> On 11/28/2023 10:09 PM, Jakub Kicinski wrote:
>> For Rx under load larger rings are sometimes useful to avoid drops.
>> But your Tx rings are larger than Rx, which is a bit odd.
> Agree. Just looked into the history, and it looks like this size was chosen
> since the very first commit of this driver.
>
>> I was going to say that with BQL enabled you're very unlikely to ever
>> use much of the 4k Tx ring, anyway. But you don't have BQL support :S
>>
>> My free advice is to recheck you really need these sizes and implement
>> BQL :)
> Thanks for the hint, will consider this.
>
> Regards
> Igor
Powered by blists - more mailing lists