lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAACQVJrBLsdnQKcOzWD5UNydFGoBHus1V_2Xxm=yL1zMb_KBQA@mail.gmail.com>
Date:   Wed, 9 Oct 2019 10:25:45 +0530
From:   Vasundhara Volam <vasundhara-v.volam@...adcom.com>
To:     Jiri Pirko <jiri@...nulli.us>
Cc:     Michael Chan <michael.chan@...adcom.com>,
        David Miller <davem@...emloft.net>,
        Netdev <netdev@...r.kernel.org>, Ray Jui <ray.jui@...adcom.com>,
        Jiri Pirko <jiri@...lanox.com>, ayal@...lanox.com,
        Moshe Shemesh <moshe@...lanox.com>
Subject: Re: [PATCH net-next v2 14/22] bnxt_en: Add new FW devlink_health_reporter

On Mon, Oct 7, 2019 at 3:26 PM Jiri Pirko <jiri@...nulli.us> wrote:
>
> Fri, Aug 30, 2019 at 05:54:57AM CEST, michael.chan@...adcom.com wrote:
> >From: Vasundhara Volam <vasundhara-v.volam@...adcom.com>
> >
> >Create new FW devlink_health_reporter, to know the current health
> >status of FW.
> >
> >Command example and output:
> >$ devlink health show pci/0000:af:00.0 reporter fw
> >
> >pci/0000:af:00.0:
> >  name fw
> >    state healthy error 0 recover 0
> >
> > FW status: Healthy; Reset count: 1
>
> I'm puzzled how did you get this output, since you put "FW status" into
> "diagnose" callback fmsg and that is called upon "devlink health diagnose".
>
> [...]
Jiri, you are right last line is output of diagnose command. Command
is missing here.

$ devlink health diagnose pci/0000:af:00.0 reporter fw
 FW status: Healthy; Reset count: 0

>
> >+static int bnxt_fw_reporter_diagnose(struct devlink_health_reporter *reporter,
> >+                                   struct devlink_fmsg *fmsg)
> >+{
> >+      struct bnxt *bp = devlink_health_reporter_priv(reporter);
> >+      struct bnxt_fw_health *health = bp->fw_health;
> >+      u32 val, health_status;
> >+      int rc;
> >+
> >+      if (!health || test_bit(BNXT_STATE_IN_FW_RESET, &bp->state))
> >+              return 0;
> >+
> >+      val = bnxt_fw_health_readl(bp, BNXT_FW_HEALTH_REG);
> >+      health_status = val & 0xffff;
> >+
> >+      if (health_status == BNXT_FW_STATUS_HEALTHY) {
> >+              rc = devlink_fmsg_string_pair_put(fmsg, "FW status",
> >+                                                "Healthy;");
>
> First of all, the ";" is just wrong. You should put plain string if
> anything. You are trying to format user output here. Don't do that
> please.
>
> Please see json output:
> $ devlink health show pci/0000:af:00.0 reporter fw -j -p
>
> Please remove ";" from the strings.
Okay, I will send a patch for removing ";"

>
>
> Second, I do not understand why you need this "FW status" at all. The
> reporter itself has state healthy/error:
> pci/0000:af:00.0:
>   name fw
>     state healthy error 0 recover 0
>           ^^^^^^^
>
> "FW" is redundant of course as the reporter name is "fw".
>
> Please remove "FW status" and replace with some pair indicating the
> actual error state.
Okay, I can rename to "Status description" so that "FW" name will not
be repeated.

>
> In mlx5 they call it "Description".
>
>
> >+              if (rc)
> >+                      return rc;
> >+      } else if (health_status < BNXT_FW_STATUS_HEALTHY) {
> >+              rc = devlink_fmsg_string_pair_put(fmsg, "FW status",
> >+                                                "Not yet completed initialization;");
> >+              if (rc)
> >+                      return rc;
> >+      } else if (health_status > BNXT_FW_STATUS_HEALTHY) {
> >+              rc = devlink_fmsg_string_pair_put(fmsg, "FW status",
> >+                                                "Encountered fatal error and cannot recover;");
> >+              if (rc)
> >+                      return rc;
> >+      }
> >+
> >+      if (val >> 16) {
> >+              rc = devlink_fmsg_u32_pair_put(fmsg, "Error", val >> 16);
>
> Perhaps rather call this "Error code"?
Okay.

>
>
> >+              if (rc)
> >+                      return rc;
> >+      }
> >+
> >+      val = bnxt_fw_health_readl(bp, BNXT_FW_RESET_CNT_REG);
> >+      rc = devlink_fmsg_u32_pair_put(fmsg, "Reset count", val);
>
> What is this counter counting? Number of recoveries?
> If so, that is also already counted internally by devlink.
"Reset count" is the counter that displays the number of times
firmware has gone for
reset through different mechanisms and devlink is one of it. Firmware
could have gone
for a reset through other tools as well.

Driver gets the information from firmware health register, when
diagnose command is invoked.

>
> [...]

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ