[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wiZZi7FcvqVSUirHBjx0bBUZ4dFrMDVLc3+3HCrtq0rBA@mail.gmail.com>
Date: Sat, 25 Nov 2023 19:04:36 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Igor Russkikh <irusskikh@...vell.com>
Cc: Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Netdev <netdev@...r.kernel.org>
Subject: Aquantia ethernet driver suspend/resume issues
Ok, so this is pretty random, but I ended up replacing my main SSD
today, and decided that I'll just do a clean re-install and copy my
user data over from my old SSD. As a result of all that, my ethernet
cable ended up in a random ethernet port when I reconnected
everything, and because of the system reinstall I ended up with
suspend-at-idle on by default (which I very much don't want, but I
only noticed after it happened).
And it turns out that suspend/resume *really* doesn't work on the
Aquantia ethernet driver, which is where the cable happened to be.
First you get an allocation failure at resume:
kworker/u256:41: page allocation failure: order:6,
mode:0x40d00(GFP_NOIO|__GFP_COMP|__GFP_ZERO),
nodemask=(null),cpuset=/,mems_allowed=0
CPU: 58 PID: 11654 Comm: kworker/u256:41 Not tainted
Workqueue: events_unbound async_run_entry_fn
Call Trace:
<TASK>
dump_stack_lvl+0x47/0x60
warn_alloc+0x165/0x1e0
__alloc_pages_slowpath.constprop.0+0xcd4/0xd90
__alloc_pages+0x32d/0x350
__kmalloc_large_node+0x73/0x130
__kmalloc+0xc3/0x150
aq_ring_alloc+0x22/0xb0 [atlantic]
aq_vec_ring_alloc+0xee/0x1a0 [atlantic]
aq_nic_init+0x118/0x1d0 [atlantic]
atl_resume_common+0x40/0xd0 [atlantic]
...
and immediately after that we get
trying to free invalid coherent area: 000000006fb35228
WARNING: CPU: 58 PID: 11654 at kernel/dma/remap.c:65
dma_common_free_remap+0x2d/0x40
CPU: 58 PID: 11654 Comm: kworker/u256:41 Not tainted 6.5.6-300.fc39.x86_64 #1
Workqueue: events_unbound async_run_entry_fn
Call Trace:
<TASK>
__iommu_dma_free+0xe8/0x100
aq_ring_alloc+0xa4/0xb0 [atlantic]
aq_vec_ring_alloc+0xee/0x1a0 [atlantic]
aq_nic_init+0x118/0x1d0 [atlantic]
atl_resume_common+0x40/0xd0 [atlantic]
...
atlantic 0000:44:00.0: PM: dpm_run_callback():
pci_pm_resume+0x0/0xf0 returns -12
atlantic 0000:44:00.0: PM: failed to resume async: error -12
and now the slab cache is corrupt and the system is dead.
My *guess* is that what is going on is that when the kcalloc() failued
(because it tries to allocate a large area, and it has only been
tested at boot-time when it succeeds), we end up doing that
err_exit:
if (err < 0) {
aq_ring_free(self);
self = NULL;
}
but aq_ring_free() does
kfree(self->buff_ring);
if (self->dx_ring)
dma_free_coherent(aq_nic_get_dev(self->aq_nic),
self->size * self->dx_size, self->dx_ring,
self->dx_ring_pa);
and notice how it will free the dx_ring even though it was never
allocated! I suspect dc_ring is non-zero because it was allocated
earlier, but the suspend free'd it - but never cleared the pointer.
That "never cleared the pointer on free" is true for buff_ring too,
but the aq_ring_alloc() did
self->buff_ring =
kcalloc(self->size, sizeof(struct aq_ring_buff_s), GFP_KERNEL);
so when that failed, at least it re-initialized that part to NULL, so
we just had a kfree(NULL) which is fine.
Anyway, I suspect a fix for the fatal error might be something like
the attached, but I think the *root* of the problem is how the
aquantia driver tried to allocate a humongous buff_ring with kmalloc,
which really doesn't work. You can see that "order:6", ie we're
talking an allocation > 100kB, and in low-memory situations that kind
of kmalloc space simply isn't available. It *will* fail.
Again, during boot you'll probably never see any issues. During
suspend/resume it very much does not work.
In general, suspend/resume should *not* do big memory management
things. It should probably have never free'd the old data structure,
and it most definitely cannot try to allocate a big new data structure
in resume.
To make matters worse, it looks like there's not just *one* of those
big allocations, there's multiple ones, both for RX and TX. But I
didn't look much more closely.
I don't know what the right fix is, but *one* fix would certainly be
to not tear everything down at suspend time, only to build it up again
at resume.
And please please please don't double-free things randomly (if that is
what was going on, but it does look like it was).
Linus
View attachment "patch.diff" of type "text/x-patch" (781 bytes)
Powered by blists - more mailing lists