netdev - Re: [EXT] Aquantia ethernet driver suspend/resume issues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3b607ba8-ef5a-56b3-c907-694c0bde437c@marvell.com>
Date: Tue, 23 Jan 2024 15:58:59 +0100
From: Igor Russkikh <irusskikh@...vell.com>
To: Peter Waller <p@...ller.net>, Jakub Kicinski <kuba@...nel.org>
CC: Linus Torvalds <torvalds@...ux-foundation.org>,
        Eric Dumazet
	<edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
        Netdev
	<netdev@...r.kernel.org>
Subject: Re: [EXT] Aquantia ethernet driver suspend/resume issues


On 1/21/2024 10:05 PM, Peter Waller wrote:
> I see a fix for double free [0] landed in 6.7; I've been running that
> for a few days and have hit a resume from suspend issue twice. Stack
> trace looks a little different (via __iommu_dma_map instead of
> __iommu_dma_free), provided below.
> 
> I've had resume issues with the atlantic driver since I've had this
> hardware, but it went away for a while and seems as though it may have
> come back with 6.7. (No crashes since logs begin on Dec 15 till Jan 12,
> Upgrade to 6.7; crashes 20th and 21st, though my usage style of the
> system has also varied, maybe crashes are associated with higher memory
> usage?).

Hi Peter,

Are these hard crashes, or just warnings in dmesg you see?
>From the log you provided it looks like a warning, meaning system is usable
and driver can be restored with `if down/up` sequence.

If so, then this is somewhat expected, because I'm still looking into
how to refactor this suspend/resume cycle to reduce mem usage.
Permanent workaround would be to reduce rx/tx ring sizes with something like

    ethtool -G rx 1024 tx 1024

If its a hard panic, we should look deeper into it.

> Possibly unrelated but I also see fairly frequent (1 to ten times per
> boot, since logs begin?) messages in my logs of the form "atlantic
> 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0014
> address=0xffce8000 flags=0x0020]".

Seems to be unrelated, but basically indicates HW or FW tries to access unmapped
memory addresses, and iommu catches that.
Full dmesg may help analyze this.

Regards
  Igor