[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAACQVJrBLsdnQKcOzWD5UNydFGoBHus1V_2Xxm=yL1zMb_KBQA@mail.gmail.com>
Date: Wed, 9 Oct 2019 10:25:45 +0530
From: Vasundhara Volam <vasundhara-v.volam@...adcom.com>
To: Jiri Pirko <jiri@...nulli.us>
Cc: Michael Chan <michael.chan@...adcom.com>,
David Miller <davem@...emloft.net>,
Netdev <netdev@...r.kernel.org>, Ray Jui <ray.jui@...adcom.com>,
Jiri Pirko <jiri@...lanox.com>, ayal@...lanox.com,
Moshe Shemesh <moshe@...lanox.com>
Subject: Re: [PATCH net-next v2 14/22] bnxt_en: Add new FW devlink_health_reporter
On Mon, Oct 7, 2019 at 3:26 PM Jiri Pirko <jiri@...nulli.us> wrote:
>
> Fri, Aug 30, 2019 at 05:54:57AM CEST, michael.chan@...adcom.com wrote:
> >From: Vasundhara Volam <vasundhara-v.volam@...adcom.com>
> >
> >Create new FW devlink_health_reporter, to know the current health
> >status of FW.
> >
> >Command example and output:
> >$ devlink health show pci/0000:af:00.0 reporter fw
> >
> >pci/0000:af:00.0:
> > name fw
> > state healthy error 0 recover 0
> >
> > FW status: Healthy; Reset count: 1
>
> I'm puzzled how did you get this output, since you put "FW status" into
> "diagnose" callback fmsg and that is called upon "devlink health diagnose".
>
> [...]
Jiri, you are right last line is output of diagnose command. Command
is missing here.
$ devlink health diagnose pci/0000:af:00.0 reporter fw
FW status: Healthy; Reset count: 0
>
> >+static int bnxt_fw_reporter_diagnose(struct devlink_health_reporter *reporter,
> >+ struct devlink_fmsg *fmsg)
> >+{
> >+ struct bnxt *bp = devlink_health_reporter_priv(reporter);
> >+ struct bnxt_fw_health *health = bp->fw_health;
> >+ u32 val, health_status;
> >+ int rc;
> >+
> >+ if (!health || test_bit(BNXT_STATE_IN_FW_RESET, &bp->state))
> >+ return 0;
> >+
> >+ val = bnxt_fw_health_readl(bp, BNXT_FW_HEALTH_REG);
> >+ health_status = val & 0xffff;
> >+
> >+ if (health_status == BNXT_FW_STATUS_HEALTHY) {
> >+ rc = devlink_fmsg_string_pair_put(fmsg, "FW status",
> >+ "Healthy;");
>
> First of all, the ";" is just wrong. You should put plain string if
> anything. You are trying to format user output here. Don't do that
> please.
>
> Please see json output:
> $ devlink health show pci/0000:af:00.0 reporter fw -j -p
>
> Please remove ";" from the strings.
Okay, I will send a patch for removing ";"
>
>
> Second, I do not understand why you need this "FW status" at all. The
> reporter itself has state healthy/error:
> pci/0000:af:00.0:
> name fw
> state healthy error 0 recover 0
> ^^^^^^^
>
> "FW" is redundant of course as the reporter name is "fw".
>
> Please remove "FW status" and replace with some pair indicating the
> actual error state.
Okay, I can rename to "Status description" so that "FW" name will not
be repeated.
>
> In mlx5 they call it "Description".
>
>
> >+ if (rc)
> >+ return rc;
> >+ } else if (health_status < BNXT_FW_STATUS_HEALTHY) {
> >+ rc = devlink_fmsg_string_pair_put(fmsg, "FW status",
> >+ "Not yet completed initialization;");
> >+ if (rc)
> >+ return rc;
> >+ } else if (health_status > BNXT_FW_STATUS_HEALTHY) {
> >+ rc = devlink_fmsg_string_pair_put(fmsg, "FW status",
> >+ "Encountered fatal error and cannot recover;");
> >+ if (rc)
> >+ return rc;
> >+ }
> >+
> >+ if (val >> 16) {
> >+ rc = devlink_fmsg_u32_pair_put(fmsg, "Error", val >> 16);
>
> Perhaps rather call this "Error code"?
Okay.
>
>
> >+ if (rc)
> >+ return rc;
> >+ }
> >+
> >+ val = bnxt_fw_health_readl(bp, BNXT_FW_RESET_CNT_REG);
> >+ rc = devlink_fmsg_u32_pair_put(fmsg, "Reset count", val);
>
> What is this counter counting? Number of recoveries?
> If so, that is also already counted internally by devlink.
"Reset count" is the counter that displays the number of times
firmware has gone for
reset through different mechanisms and devlink is one of it. Firmware
could have gone
for a reset through other tools as well.
Driver gets the information from firmware health register, when
diagnose command is invoked.
>
> [...]
Powered by blists - more mailing lists