netdev - Aquantia ethernet driver suspend/resume issues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wiZZi7FcvqVSUirHBjx0bBUZ4dFrMDVLc3+3HCrtq0rBA@mail.gmail.com>
Date: Sat, 25 Nov 2023 19:04:36 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Igor Russkikh <irusskikh@...vell.com>
Cc: Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, 
	Paolo Abeni <pabeni@...hat.com>, Netdev <netdev@...r.kernel.org>
Subject: Aquantia ethernet driver suspend/resume issues

Ok, so this is pretty random, but I ended up replacing my main SSD
today, and decided that I'll just do a clean re-install and copy my
user data over from my old SSD. As a result of all that, my ethernet
cable ended up in a random ethernet port when I reconnected
everything, and because of the system reinstall I ended up with
suspend-at-idle on by default (which I very much don't want, but I
only noticed after it happened).

And it turns out that suspend/resume *really* doesn't work on the
Aquantia ethernet driver, which is where the cable happened to be.

First you get an allocation failure at resume:

  kworker/u256:41: page allocation failure: order:6,
mode:0x40d00(GFP_NOIO|__GFP_COMP|__GFP_ZERO),
nodemask=(null),cpuset=/,mems_allowed=0
  CPU: 58 PID: 11654 Comm: kworker/u256:41 Not tainted
  Workqueue: events_unbound async_run_entry_fn
  Call Trace:
   <TASK>
   dump_stack_lvl+0x47/0x60
   warn_alloc+0x165/0x1e0
   __alloc_pages_slowpath.constprop.0+0xcd4/0xd90
   __alloc_pages+0x32d/0x350
   __kmalloc_large_node+0x73/0x130
   __kmalloc+0xc3/0x150
   aq_ring_alloc+0x22/0xb0 [atlantic]
   aq_vec_ring_alloc+0xee/0x1a0 [atlantic]
   aq_nic_init+0x118/0x1d0 [atlantic]
   atl_resume_common+0x40/0xd0 [atlantic]
   ...

and immediately after that we get

  trying to free invalid coherent area: 000000006fb35228
  WARNING: CPU: 58 PID: 11654 at kernel/dma/remap.c:65
dma_common_free_remap+0x2d/0x40
  CPU: 58 PID: 11654 Comm: kworker/u256:41 Not tainted 6.5.6-300.fc39.x86_64 #1
  Workqueue: events_unbound async_run_entry_fn
  Call Trace:
   <TASK>
   __iommu_dma_free+0xe8/0x100
   aq_ring_alloc+0xa4/0xb0 [atlantic]
   aq_vec_ring_alloc+0xee/0x1a0 [atlantic]
   aq_nic_init+0x118/0x1d0 [atlantic]
   atl_resume_common+0x40/0xd0 [atlantic]
   ...
  atlantic 0000:44:00.0: PM: dpm_run_callback():
pci_pm_resume+0x0/0xf0 returns -12
  atlantic 0000:44:00.0: PM: failed to resume async: error -12

and now the slab cache is corrupt and the system is dead.

My *guess* is that what is going on is that when the kcalloc() failued
(because it tries to allocate a large area, and it has only been
tested at boot-time when it succeeds),  we end up doing that

  err_exit:
        if (err < 0) {
                aq_ring_free(self);
                self = NULL;
        }

but aq_ring_free() does

        kfree(self->buff_ring);

        if (self->dx_ring)
                dma_free_coherent(aq_nic_get_dev(self->aq_nic),
                                  self->size * self->dx_size, self->dx_ring,
                                  self->dx_ring_pa);

and notice how it will free the dx_ring even though it was never
allocated! I suspect dc_ring is  non-zero because it was allocated
earlier, but the suspend free'd it - but never cleared the pointer.

That "never cleared the pointer on free" is true for buff_ring too,
but the aq_ring_alloc() did

        self->buff_ring =
                kcalloc(self->size, sizeof(struct aq_ring_buff_s), GFP_KERNEL);

so when that failed, at least it re-initialized that part to NULL, so
we just had a kfree(NULL) which is fine.

Anyway, I suspect a fix for the fatal error might be something like
the attached, but I think the *root* of the problem is how the
aquantia driver tried to allocate a humongous buff_ring with kmalloc,
which really doesn't work.  You can see that "order:6", ie we're
talking an allocation > 100kB, and in low-memory situations that kind
of kmalloc space simply isn't available. It *will* fail.

Again, during boot you'll probably never see any issues. During
suspend/resume it very much does not work.

In general, suspend/resume should *not* do big memory management
things. It should probably have never free'd the old data structure,
and it most definitely cannot try to allocate a big new data structure
in resume.

To make matters worse, it looks like there's not just *one* of those
big allocations, there's multiple ones, both for RX and TX. But I
didn't look much more closely.

I don't know what the right fix is, but *one* fix would certainly be
to not tear everything down at suspend time, only to build it up again
at resume.

And please please please don't double-free things randomly (if that is
what was going on, but it does look like it was).

           Linus

View attachment "patch.diff" of type "text/x-patch" (781 bytes)