[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190104085757.GA21274@nanopsycho.orion>
Date: Fri, 4 Jan 2019 09:57:57 +0100
From: Jiri Pirko <jiri@...nulli.us>
To: Jakub Kicinski <jakub.kicinski@...ronome.com>
Cc: Eran Ben Elisha <eranbe@...lanox.com>, netdev@...r.kernel.org,
"David S. Miller" <davem@...emloft.net>,
Jiri Pirko <jiri@...lanox.com>,
Moshe Shemesh <moshe@...lanox.com>,
Aya Levin <ayal@...lanox.com>, Tal Alon <talal@...lanox.com>,
Ariel Almog <ariela@...lanox.com>
Subject: Re: [PATCH RFC net-next 00/19] Devlink health reporting and recovery
system
Tue, Jan 01, 2019 at 02:47:27AM CET, jakub.kicinski@...ronome.com wrote:
>On Mon, 31 Dec 2018 16:31:54 +0200, Eran Ben Elisha wrote:
>> The health mechanism is targeted for Real Time Alerting, in order to know when
>> something bad had happened to a PCI device
>> - Provide alert debug information
>> - Self healing
>> - If problem needs vendor support, provide a way to gather all needed debugging
>> information.
>>
>> The main idea is to unify and centralize driver health reports in the
>> generic devlink instance and allow the user to set different
>> attributes of the health reporting and recovery procedures.
>>
>> The devlink health reporter:
>> Device driver creates a "health reporter" per each error/health type.
>> Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error)
>> or unknown (driver specific).
>> For each registered health reporter a driver can issue error/health reports
>> asynchronously. All health reports handling is done by devlink.
>> Device driver can provide specific callbacks for each "health reporter", e.g.
>> - Recovery procedures
>> - Diagnostics and object dump procedures
>> - OOB initial attributes
>> Different parts of the driver can register different types of health reporters
>> with different handlers.
>>
>> Once an error is reported, devlink health will do the following actions:
>> * A log is being send to the kernel trace events buffer
>> * Health status and statistics are being updated for the reporter instance
>> * Object dump is being taken and saved at the reporter instance (as long as
>> there is no other Objdump which is already stored)
>> * Auto recovery attempt is being done. Depends on:
>> - Auto-recovery configuration
>> - Grace period vs. time passed since last recover
>>
>> The user interface:
>> User can access/change each reporter attributes and driver specific callbacks
>> via devlink, e.g per error type (per health reporter)
>> - Configure reporter's generic attributes (like: Disable/enable auto recovery)
>> - Invoke recovery procedure
>> - Run diagnostics
>> - Object dump
>
>Thanks for continuing this work!
>
>> The devlink health interface (via netlink):
>> DEVLINK_CMD_HEALTH_REPORTER_GET
>> Retrieves status and configuration info per DEV and reporter.
>> DEVLINK_CMD_HEALTH_REPORTER_SET
>> Allows reporter-related configuration setting.
>> DEVLINK_CMD_HEALTH_REPORTER_RECOVER
>> Triggers a reporter's recovery procedure.
>> DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE
>> Retrieves diagnostics data from a reporter on a device.
>> DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_GET
>
>The addition of "objdump" and its marshalling is a bit disappointing.
>It seemed to me when region snapshots were added that they would serve
>this exact purpose. Taking a region snapshot or "core dump", if you
>will, after something went south. Now it's confusing to have two
>mechanisms serving the same purpose.
>
>It's unclear from quick reading of the code how if the buffers get
>timestamped. Can you report more than one?
>
>About the marshalling, I'm not sure it belongs in the kernel. There is
>precedent for adding interpretation of FW blobs in user space (ethtool).
>IMHO it's a more scalable approach, if we want to avoid having 100 kLoC
>drivers. Amount of code you add to print the simple example from last
>patch is not inspiring confidence.
>
>And on the bike shedding side :) -> I think you should steer clear of
>calling this objdump, as it has very little to do with the
>functionality of well-known objdump tool. Its not even clear what the
>object the name is referring to is.
>
>Long story short the overlap with region snapshots makes it unclear
>what purpose either serves, and IMHO we should avoid the marshalling by
>teaching user space how to interpret snapshots. Preferably we only
>have one dump mechanism, and user space can be taught the interpretation
>once.
Unlike regions, which are binary blobs, the "objdump" we have here is a
tree of values (json like). It is not taken from FW/HW. It is generated
in driver and passed down to user to be shown. Originally, Eran just
pushed that info into a string buffer and the user had to parse it
again. I asked to format this in Netlink attributes JSON-style so the
user gets all hierarchy without need of parsing and can easily do
Netlink->human_stdout or Netlink->JSON.
>
>> Retrieves the last stored objdump. Devlink health
>> saves a single objdump. If an objdump is not already stored by the devlink
>> for this reporter, devlink generates a new objdump.
>> Objdump output is defined by the reporter.
>> DEVLINK_CMD_HEALTH_REPORTER_OBJDUMP_CLEAR
>> Clears the last saved objdump file for the specified reporter.
>>
>>
>> netlink
>> +--------------------------+
>> | |
>> | + |
>> | | |
>> +--------------------------+
>> |request for ops
>> |(diagnose,
>> mlx5_core devlink |recover,
>> |dump)
>> +--------+ +--------------------------+
>> | | | reporter| |
>> | | | +---------v----------+ |
>> | | ops execution | | | |
>> | <----------------------------------+ | |
>> | | | | | |
>> | | | + ^------------------+ |
>> | | | | request for ops |
>> | | | | (recover, dump) |
>> | | | | |
>> | | | +-+------------------+ |
>> | | health report | | health handler | |
>> | +-------------------------------> | |
>> | | | +--------------------+ |
>> | | health reporter create | |
>> | +----------------------------> |
>> +--------+ +--------------------------+
>>
>> Available reporters:
>> In this patchset, three reporters of mlx5e driver are included. The FW
>> reporters implement devlink_health_reporter diagnostic, object dump and
>> recovery procedures. The TX reporter implements devlink_health_reporter
>> diagnostic and recovery procedures.
>>
>> This RFC is based on the same concepts as previous RFC I sent earlier this
>> year: "[RFC PATCH iproute2-next] System specification health API". The API was
>> changed, also devlink code and mlx5e reporters were not available at the
>> previous RFC.
Powered by blists - more mailing lists