[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8a064f7f-298e-9cbe-d58a-fdf3d79eef24@mellanox.com>
Date: Thu, 27 Sep 2018 17:02:48 +0300
From: Eran Ben Elisha <eranbe@...lanox.com>
To: Jiri Pirko <jiri@...nulli.us>
Cc: netdev@...r.kernel.org,
Jakub Kicinski <jakub.kicinski@...ronome.com>,
Jiri Pirko <jiri@...lanox.com>,
Stephen Hemminger <stephen@...workplumber.org>,
Andrew Lunn <andrew@...n.ch>, "Tobin C. Harding" <me@...in.cc>,
Ariel Almog <ariela@...lanox.com>,
Tal Alon <talal@...lanox.com>
Subject: Re: [RFC PATCH iproute2-next V2] System specification exception API
On 9/27/2018 3:47 PM, Jiri Pirko wrote:
> Wed, Sep 26, 2018 at 01:52:58PM CEST, eranbe@...lanox.com wrote:
>> The exception spec is targeted for Real Time Alerting, in order to know when
>> something bad had happened to a PCI device
>> - Provide alert debug information
>> - Self healing
>> - If problem needs vendor support, provide a way to gather all needed debugging
>> information.
>>
>> The exception mechanism contains condition checkers which sense for malfunction. Upon a condition hit,
>> actions such as logs and correction can be taken.
>>
>> The condition checkers are divided into the following groups
>> - Hardware - a checker which is triggered by the device due to
>> malfunction.
>> - Software - a checker which is triggered by the software due to
>> malfunction.
>
> What do you mean by a "software malfunction", a "FW malfunction"?
> Also, I don't see this 2 groups in the man.
Software malfunction can be a Transmit error (caused by bad send request).
FW/HW malfunction can be any catastrophic error report (the ones that
should be exposed to driver).
The comment here was to highlight that we can support different kinds of
condition groups.
If for a specific condition, we will need to highlight it is SW/HW, we
can concatenate it to its name.
Eran
>
>
>> Both groups of condition checkers can be triggered due to error event or due to a periodic check.
>>
>> Actions are the way to handle those events. Action can be in one of the
>> following groups:
>> - Dump - SW trace, SW dump, HW trace, HW dump
>> - Reset - Surgical correction (e.g. modify Q, flush Q, reset of device, etc)
>> Actions can be performed by SW or HW.
>>
>> User is allowed to enable or disable condition checkers and its action mapping.
>>
>> This RFC man page patch describes the suggested API of devlink-exception in order
>> to control conditions and actions.
>>
>> V2:
>> * Renaming terms:
>> health -> exception
>> sensor -> condition
>> * Remove reinit command and merge with action command.
>> * Consmetics in grammer.
>>
>> Eran Ben Elisha (1):
>> man: Add devlink exception man page
>>
>> man/man8/devlink-exception.8 | 158 +++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 158 insertions(+)
>> create mode 100644 man/man8/devlink-exception.8
>>
>> --
>> 1.8.3.1
>>
Powered by blists - more mailing lists