lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20191009070442.GF2326@nanopsycho>
Date:   Wed, 9 Oct 2019 09:04:42 +0200
From:   Jiri Pirko <jiri@...nulli.us>
To:     Vasundhara Volam <vasundhara-v.volam@...adcom.com>
Cc:     Michael Chan <michael.chan@...adcom.com>,
        David Miller <davem@...emloft.net>,
        Netdev <netdev@...r.kernel.org>, Ray Jui <ray.jui@...adcom.com>,
        Jiri Pirko <jiri@...lanox.com>, ayal@...lanox.com,
        Moshe Shemesh <moshe@...lanox.com>
Subject: Re: [PATCH net-next v2 14/22] bnxt_en: Add new FW
 devlink_health_reporter

Wed, Oct 09, 2019 at 06:55:45AM CEST, vasundhara-v.volam@...adcom.com wrote:
>On Mon, Oct 7, 2019 at 3:26 PM Jiri Pirko <jiri@...nulli.us> wrote:
>>
>> Fri, Aug 30, 2019 at 05:54:57AM CEST, michael.chan@...adcom.com wrote:
>> >From: Vasundhara Volam <vasundhara-v.volam@...adcom.com>
>> >
>> >Create new FW devlink_health_reporter, to know the current health
>> >status of FW.
>> >
>> >Command example and output:
>> >$ devlink health show pci/0000:af:00.0 reporter fw
>> >
>> >pci/0000:af:00.0:
>> >  name fw
>> >    state healthy error 0 recover 0
>> >
>> > FW status: Healthy; Reset count: 1
>>
>> I'm puzzled how did you get this output, since you put "FW status" into
>> "diagnose" callback fmsg and that is called upon "devlink health diagnose".
>>
>> [...]
>Jiri, you are right last line is output of diagnose command. Command
>is missing here.
>
>$ devlink health diagnose pci/0000:af:00.0 reporter fw
> FW status: Healthy; Reset count: 0
>
>>
>> >+static int bnxt_fw_reporter_diagnose(struct devlink_health_reporter *reporter,
>> >+                                   struct devlink_fmsg *fmsg)
>> >+{
>> >+      struct bnxt *bp = devlink_health_reporter_priv(reporter);
>> >+      struct bnxt_fw_health *health = bp->fw_health;
>> >+      u32 val, health_status;
>> >+      int rc;
>> >+
>> >+      if (!health || test_bit(BNXT_STATE_IN_FW_RESET, &bp->state))
>> >+              return 0;
>> >+
>> >+      val = bnxt_fw_health_readl(bp, BNXT_FW_HEALTH_REG);
>> >+      health_status = val & 0xffff;
>> >+
>> >+      if (health_status == BNXT_FW_STATUS_HEALTHY) {
>> >+              rc = devlink_fmsg_string_pair_put(fmsg, "FW status",
>> >+                                                "Healthy;");
>>
>> First of all, the ";" is just wrong. You should put plain string if
>> anything. You are trying to format user output here. Don't do that
>> please.
>>
>> Please see json output:
>> $ devlink health show pci/0000:af:00.0 reporter fw -j -p
>>
>> Please remove ";" from the strings.
>Okay, I will send a patch for removing ";"
>
>>
>>
>> Second, I do not understand why you need this "FW status" at all. The
>> reporter itself has state healthy/error:
>> pci/0000:af:00.0:
>>   name fw
>>     state healthy error 0 recover 0
>>           ^^^^^^^
>>
>> "FW" is redundant of course as the reporter name is "fw".
>>
>> Please remove "FW status" and replace with some pair indicating the
>> actual error state.
>Okay, I can rename to "Status description" so that "FW" name will not
>be repeated.

Also please remove the redundant "Healthy" information. That is an
attribute of the reporter itself.


>
>>
>> In mlx5 they call it "Description".
>>
>>
>> >+              if (rc)
>> >+                      return rc;
>> >+      } else if (health_status < BNXT_FW_STATUS_HEALTHY) {
>> >+              rc = devlink_fmsg_string_pair_put(fmsg, "FW status",
>> >+                                                "Not yet completed initialization;");
>> >+              if (rc)
>> >+                      return rc;
>> >+      } else if (health_status > BNXT_FW_STATUS_HEALTHY) {
>> >+              rc = devlink_fmsg_string_pair_put(fmsg, "FW status",
>> >+                                                "Encountered fatal error and cannot recover;");
>> >+              if (rc)
>> >+                      return rc;
>> >+      }
>> >+
>> >+      if (val >> 16) {
>> >+              rc = devlink_fmsg_u32_pair_put(fmsg, "Error", val >> 16);
>>
>> Perhaps rather call this "Error code"?
>Okay.
>
>>
>>
>> >+              if (rc)
>> >+                      return rc;
>> >+      }
>> >+
>> >+      val = bnxt_fw_health_readl(bp, BNXT_FW_RESET_CNT_REG);
>> >+      rc = devlink_fmsg_u32_pair_put(fmsg, "Reset count", val);
>>
>> What is this counter counting? Number of recoveries?
>> If so, that is also already counted internally by devlink.
>"Reset count" is the counter that displays the number of times
>firmware has gone for
>reset through different mechanisms and devlink is one of it. Firmware
>could have gone
>for a reset through other tools as well.
>
>Driver gets the information from firmware health register, when
>diagnose command is invoked.

okay, sounds sane.
Thanks!

>
>>
>> [...]

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ