netdev - Re: [RFC PATCH] net: hns3: add devlink health dump support for hw mac tbl

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20191010080540.GA2223@nanopsycho>
Date:   Thu, 10 Oct 2019 10:05:40 +0200
From:   Jiri Pirko <jiri@...nulli.us>
To:     Yunsheng Lin <linyunsheng@...wei.com>
Cc:     jiri@...lanox.com, yisen.zhuang@...wei.com, salil.mehta@...wei.com,
        davem@...emloft.net, netdev@...r.kernel.org, linuxarm@...wei.com
Subject: Re: [RFC PATCH] net: hns3: add devlink health dump support for hw
 mac tbl

Thu, Oct 10, 2019 at 03:32:54AM CEST, linyunsheng@...wei.com wrote:
>On 2019/10/5 10:06, Yunsheng Lin wrote:
>> On 2019/9/30 17:27, Jiri Pirko wrote:
>>> Sun, Sep 29, 2019 at 02:13:43PM CEST, linyunsheng@...wei.com wrote:
>>>> This patch adds the devlink health dump support for hw mac tbl,
>>>> which helps to debug some hardware packet switching problem or
>>>> misconfiguation of the hardware mac table.
>>>
>>> So you basically make internal hw table available to the user to see if
>>> I understand that correctly. There is a "devlink dpipe" api that
>>> basically serves the purpose of describing the hw pipeline and also
>>> allows user to see content of individual tables in the pipeline. Perhaps
>>> that would be more suitable for you?
>>>
>> 
>> Thanks.
>> I will look into the "devlink dpipe" api.
>> It seems the dpipe api is a little complicated than health api, and
>> there seems to be no mention of it in man page in below:
>> http://man7.org/linux/man-pages/man8/devlink.8.html
>> 
>> Any more detail info about "devlink dpipe" api would be very helpful.
>
>After looking into "devlink dpipe" api, it seems dpipe is used to debug
>match and action operation of a hardware table.
>
>We might be able to use "devlink dpipe" api for dumping the mac tbl in
>our hardware if we extend the action type.
>But I am not sure a few field in our hw table below to match or action
>operation, like the type(0 for unicast and 2 for multicast) field, it is
>used to tell hardware the specfic entry is about unicast or multicast,
>so that the hardware does a different processing when a packet is matched,
>it is more like the property of the specfic entry.

I don't see problem having this in dpipe.


>
>So the health dump is flexible enough for us to dump the hw table in
>our hardware, we plan to use health dump api instead of dpipe api.

Please don't. I know that "health dump" can carry everything. That does
not mean it should. Tables belong to dpipe.


>
>If we implement the recover ops, we can reuse the dump ops to dump the
>hw table the moment the erorr happens too.

Which error? How the error is related to "hw table"?


>
>> 
>>>
>>>>
>>>> And diagnose and recover support has not been added because
>>>> RAS and misc error handling has done the recover process, we
>>>> may also support the recover process through the devlink health
>>>> api in the future.
>>>
>>> What would be the point for the reporter and what would you recover
>>> from? It that related to the mac table? How?
>> 
>> There is some error that can be detected by hw when it is accessing some
>> internal memory, such as below error types defined in
>> ethernet/hisilicon/hns3/hns3pf/hclge_err.c:
>> 
>> static const struct hclge_hw_error hclge_ppp_mpf_abnormal_int_st1[] = {
>> 	{ .int_msk = BIT(0), .msg = "vf_vlan_ad_mem_ecc_mbit_err",
>> 	  .reset_level = HNAE3_GLOBAL_RESET },
>> 	{ .int_msk = BIT(1), .msg = "umv_mcast_group_mem_ecc_mbit_err",
>> 	  .reset_level = HNAE3_GLOBAL_RESET },
>> 	{ .int_msk = BIT(2), .msg = "umv_key_mem0_ecc_mbit_err",
>> 	  .reset_level = HNAE3_GLOBAL_RESET },
>> 	{ .int_msk = BIT(3), .msg = "umv_key_mem1_ecc_mbit_err",
>> 	  .reset_level = HNAE3_GLOBAL_RESET },
>> .....
>> 
>> The above errors can be reported to the driver through struct pci_error_handlers
>> api, and the driver can record the error and recover the error by asserting a PF
>> or global reset according to the error type.
>> 
>> As for the relation to the mac table, we may need to dump everything including
>> the internal memory content to further analyze the root cause of the problem.
>> 
>> Does above make sense?
>> 
>> Thanks for taking a look at this.
>> 
>>>
>>>
>>>>
>>>> Command example and output:
>>>>
>>>> root@(none):~# devlink health dump show pci/0000:7d:00.2 reporter hw_mac_tbl
>>>> index: 556 mac: 33:33:00:00:00:01 vlan: 0 port: 0 type: 2 mc_mac_en: 1 egress_port: 255 egress_queue: 0 vsi: 0 mc bitmap:
>>>>   1 0 0 0 0 0 0 0
>>>> index: 648 mac: 00:18:2d:02:00:10 vlan: 0 port: 1 type: 0 mc_mac_en: 0 egress_port: 2 egress_queue: 0 vsi: 0
>>>> index: 728 mac: 00:18:2d:00:00:10 vlan: 0 port: 0 type: 0 mc_mac_en: 0 egress_port: 0 egress_queue: 0 vsi: 0
>>>> index: 1028 mac: 00:18:2d:01:00:10 vlan: 0 port: 2 type: 0 mc_mac_en: 0 egress_port: 1 egress_queue: 0 vsi: 0
>>>> index: 1108 mac: 00:18:2d:03:00:10 vlan: 0 port: 3 type: 0 mc_mac_en: 0 egress_port: 3 egress_queue: 0 vsi: 0
>>>> index: 1204 mac: 33:33:00:00:00:01 vlan: 0 port: 1 type: 2 mc_mac_en: 1 egress_port: 253 egress_queue: 0 vsi: 0 mc bitmap:
>>>>   4 0 0 0 0 0 0 0
>>>> index: 2844 mac: 01:00:5e:00:00:01 vlan: 0 port: 0 type: 2 mc_mac_en: 1 egress_port: 254 egress_queue: 0 vsi: 0 mc bitmap:
>>>>   1 0 0 0 0 0 0 0
>>>> index: 3460 mac: 01:00:5e:00:00:01 vlan: 0 port: 1 type: 2 mc_mac_en: 1 egress_port: 252 egress_queue: 0 vsi: 0 mc bitmap:
>>>>   4 0 0 0 0 0 0 0
>>>>
>>>> Signed-off-by: Yunsheng Lin <linyunsheng@...wei.com>
>>>> ---
>
>