[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200514120659.6f64f6e7@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
Date: Thu, 14 May 2020 12:06:59 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Igor Russkikh <irusskikh@...vell.com>
Cc: <netdev@...r.kernel.org>, "David S . Miller" <davem@...emloft.net>,
Ariel Elior <aelior@...vell.com>,
Michal Kalderon <mkalderon@...vell.com>,
Denis Bolotin <dbolotin@...vell.com>
Subject: Re: [PATCH v2 net-next 00/11] net: qed/qede: critical hw error
handling
On Thu, 14 May 2020 12:57:16 +0300 Igor Russkikh wrote:
> FastLinQ devices as a complex systems may observe various hardware
> level error conditions, both severe and recoverable.
>
> Driver is able to detect and report this, but so far it only did
> trace/dmesg based reporting.
>
> Here we implement an extended hw error detection, service task
> handler captures a dump for the later analysis.
>
> I also resubmit a patch from Denis Bolotin on tx timeout handler,
> addressing David's comment regarding recovery procedure as an extra
> reaction on this event.
>
> v2:
>
> Removing the patch with ethtool dump and udev magic. Its quite isolated,
> I'm working on devlink based logic for this separately.
>
> v1:
>
> https://patchwork.ozlabs.org/project/netdev/cover/cover.1588758463.git.irusskikh@marvell.com/
I'm not 100% happy that the debug data gets reported to the management
FW before the devlink health code is in place. For the Linux community,
I think, having standard Linux interfaces implemented first is the
priority.
Powered by blists - more mailing lists