[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1f99551f-5037-4670-9c2d-a4ce4d5c017e@trager.us>
Date: Mon, 15 Sep 2025 18:35:40 -0700
From: Lee Trager <lee@...ger.us>
To: Jakub Kicinski <kuba@...nel.org>, davem@...emloft.net
Cc: netdev@...r.kernel.org, edumazet@...gle.com, pabeni@...hat.com,
andrew+netdev@...n.ch, horms@...nel.org, alexanderduyck@...com,
jacob.e.keller@...el.com
Subject: Re: [PATCH net-next v2 8/9] eth: fbnic: report FW uptime in health
diagnose
On 9/15/25 8:53 AM, Jakub Kicinski wrote:
> FW crashes are detected based on uptime going back, expose the uptime
> via devlink health diagnose.
>
> Signed-off-by: Jakub Kicinski <kuba@...nel.org>
> ---
> .../device_drivers/ethernet/meta/fbnic.rst | 4 +++-
> drivers/net/ethernet/meta/fbnic/fbnic_devlink.c | 13 +++++++++++++
> 2 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/networking/device_drivers/ethernet/meta/fbnic.rst b/Documentation/networking/device_drivers/ethernet/meta/fbnic.rst
> index 62693566ff1f..3c81b58d8292 100644
> --- a/Documentation/networking/device_drivers/ethernet/meta/fbnic.rst
> +++ b/Documentation/networking/device_drivers/ethernet/meta/fbnic.rst
> @@ -77,7 +77,9 @@ fw reporter
>
> The ``fw`` health reporter tracks FW crashes. Dumping the reporter will
> show the core dump of the most recent FW crash, and if no FW crash has
> -happened since power cycle - a snapshot of the FW memory.
> +happened since power cycle - a snapshot of the FW memory. Diagnose callback
> +shows current FW uptime (the crashes are detected by checking if uptime
> +goes down).
I was a little surprised to see this since it didn't go through our
normal internal review and validation process.
>
> Statistics
> ----------
> diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_devlink.c b/drivers/net/ethernet/meta/fbnic/fbnic_devlink.c
> index 0e8920685da6..f3f3585c0aac 100644
> --- a/drivers/net/ethernet/meta/fbnic/fbnic_devlink.c
> +++ b/drivers/net/ethernet/meta/fbnic/fbnic_devlink.c
> @@ -487,6 +487,18 @@ static int fbnic_fw_reporter_dump(struct devlink_health_reporter *reporter,
> return err;
> }
>
> +static int
> +fbnic_fw_reporter_diagnose(struct devlink_health_reporter *reporter,
> + struct devlink_fmsg *fmsg,
> + struct netlink_ext_ack *extack)
> +{
> + struct fbnic_dev *fbd = devlink_health_reporter_priv(reporter);
> +
> + devlink_fmsg_u32_pair_put(fmsg, "FW uptime", fbd->firmware_time);
I originally added fbd->firmware_time as part of the implementation for
logging support in D51521853. The original idea was to correlate
firmware logs to host time. This proved to be difficult. Instead I used
firmware time to detect firmware crashes in D52065019. Time is to set 0
when fbnic_fw_log_write() is called in fbnic_devlink_fw_report() because
we don't know the actual time firmware crashed. fbd->firmware_time is
only updated with the heartbeat is received. When a crash occurs
fbd->firmware_time is reset once firmware comes back up. Ignoring the
crash case this should be something like fbd->firmware_time + (jiffies -
fbd->last_heartbeat_req) * 1000.
Another issue is your using a u32 for fbd->firmware_time which is u64.
Firmware returns its time by calling k_uptime_get()[1] which returns an
s64 as its the firmware uptime in milliseconds.
We also don't use firmware time in its raw integer form anywhere in the
driver or firmware. Its very hard to read FBNIC_FW_LOG_FMT has the
format used in the driver which is based on what Zephyr uses[2].
IMO this doesn't really have a use case and I would just drop it.
[1]
https://elixir.bootlin.com/zephyr/v3.7.1/source/include/zephyr/kernel.h#L1751
[2]
https://github.com/zephyrproject-rtos/zephyr/blob/main/subsys/logging/log_output.c#L263
Powered by blists - more mailing lists