[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181231174727.6280fc3c@cakuba.hsd1.ca.comcast.net>
Date: Mon, 31 Dec 2018 17:47:27 -0800
From: Jakub Kicinski <jakub.kicinski@...ronome.com>
To: Eran Ben Elisha <eranbe@...lanox.com>
Cc: netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
Jiri Pirko <jiri@...lanox.com>,
Moshe Shemesh <moshe@...lanox.com>,
Aya Levin <ayal@...lanox.com>, Tal Alon <talal@...lanox.com>,
Ariel Almog <ariela@...lanox.com>
Subject: Re: [PATCH RFC net-next 00/19] Devlink health reporting and
recovery system
On Mon, 31 Dec 2018 16:31:54 +0200, Eran Ben Elisha wrote:
> The health mechanism is targeted for Real Time Alerting, in order to know when
> something bad had happened to a PCI device
> - Provide alert debug information
> - Self healing
> - If problem needs vendor support, provide a way to gather all needed debugging
> information.
>
> The main idea is to unify and centralize driver health reports in the
> generic devlink instance and allow the user to set different
> attributes of the health reporting and recovery procedures.
>
> The devlink health reporter:
> Device driver creates a "health reporter" per each error/health type.
> Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error)
> or unknown (driver specific).
> For each registered health reporter a driver can issue error/health reports
> asynchronously. All health reports handling is done by devlink.
> Device driver can provide specific callbacks for each "health reporter", e.g.
> - Recovery procedures
> - Diagnostics and object dump procedures
> - OOB initial attributes
> Different parts of the driver can register different types of health reporters
> with different handlers.
>
> Once an error is reported, devlink health will do the following actions:
> * A log is being send to the kernel trace events buffer
> * Health status and statistics are being updated for the reporter instance
> * Object dump is being taken and saved at the reporter instance (as long as
> there is no other Objdump which is already stored)
> * Auto recovery attempt is being done. Depends on:
> - Auto-recovery configuration
> - Grace period vs. time passed since last recover
>
> The user interface:
> User can access/change each reporter attributes and driver specific callbacks
> via devlink, e.g per error type (per health reporter)
> - Configure reporter's generic attributes (like: Disable/enable auto recovery)
> - Invoke recovery procedure
> - Run diagnostics
> - Object dump
Thanks for continuing this work!
> The devlink health interface (via netlink):
> DEVLINK_CMD_HEALTH_REPORTER_GET
> Retrieves status and configuration info per DEV and reporter.
> DEVLINK_CMD_HEALTH_REPORTER_SET
> Allows reporter-related configuration setting.
> DEVLINK_CMD_HEALTH_REPORTER_RECOVER
> Triggers a reporter's recovery procedure.
> DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE
> Retrieves diagnostics data from a reporter on a device.
> DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_GET
The addition of "objdump" and its marshalling is a bit disappointing.
It seemed to me when region snapshots were added that they would serve
this exact purpose. Taking a region snapshot or "core dump", if you
will, after something went south. Now it's confusing to have two
mechanisms serving the same purpose.
It's unclear from quick reading of the code how if the buffers get
timestamped. Can you report more than one?
About the marshalling, I'm not sure it belongs in the kernel. There is
precedent for adding interpretation of FW blobs in user space (ethtool).
IMHO it's a more scalable approach, if we want to avoid having 100 kLoC
drivers. Amount of code you add to print the simple example from last
patch is not inspiring confidence.
And on the bike shedding side :) -> I think you should steer clear of
calling this objdump, as it has very little to do with the
functionality of well-known objdump tool. Its not even clear what the
object the name is referring to is.
Long story short the overlap with region snapshots makes it unclear
what purpose either serves, and IMHO we should avoid the marshalling by
teaching user space how to interpret snapshots. Preferably we only
have one dump mechanism, and user space can be taught the interpretation
once.
> Retrieves the last stored objdump. Devlink health
> saves a single objdump. If an objdump is not already stored by the devlink
> for this reporter, devlink generates a new objdump.
> Objdump output is defined by the reporter.
> DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_CLEAR
> Clears the last saved objdump file for the specified reporter.
>
>
> netlink
> +--------------------------+
> | |
> | + |
> | | |
> +--------------------------+
> |request for ops
> |(diagnose,
> mlx5_core devlink |recover,
> |dump)
> +--------+ +--------------------------+
> | | | reporter| |
> | | | +---------v----------+ |
> | | ops execution | | | |
> | <----------------------------------+ | |
> | | | | | |
> | | | + ^------------------+ |
> | | | | request for ops |
> | | | | (recover, dump) |
> | | | | |
> | | | +-+------------------+ |
> | | health report | | health handler | |
> | +-------------------------------> | |
> | | | +--------------------+ |
> | | health reporter create | |
> | +----------------------------> |
> +--------+ +--------------------------+
>
> Available reporters:
> In this patchset, three reporters of mlx5e driver are included. The FW
> reporters implement devlink_health_reporter diagnostic, object dump and
> recovery procedures. The TX reporter implements devlink_health_reporter
> diagnostic and recovery procedures.
>
> This RFC is based on the same concepts as previous RFC I sent earlier this
> year: "[RFC PATCH iproute2-next] System specification health API". The API was
> changed, also devlink code and mlx5e reporters were not available at the
> previous RFC.
Powered by blists - more mailing lists