lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 15 Aug 2020 00:05:58 +0800
From:   Coiby Xu <coiby.xu@...il.com>
To:     netdev@...r.kernel.org
Cc:     Benjamin Poirier <benjamin.poirier@...il.com>,
        linux-kernel-mentees@...ts.linuxfoundation.org,
        Manish Chopra <manishc@...vell.com>,
        GR-Linux-NIC-Dev@...vell.com,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Michal Kubecek <mkubecek@...e.cz>
Subject: [RFC 0/3] staging: qlge: Re-writing the debugging features

This patch set aims to avoid dumping registers, data structures and
coredump to dmesg and also to reduce the code size of the qlge driver.

As pointed out by Benjamin [1],

> At 2000 lines, qlge_dbg.c alone is larger than some entire ethernet
> drivers. Most of what it does is dump kernel data structures or pci
> memory mapped registers to dmesg. There are better facilities for that.
> My thinking is not simply to delete qlge_dbg.c but to replace it, making
> sure that most of the same information is still available. For data
> structures, crash or drgn can be used; possibly with a script for the
> latter which formats the data. For pci registers, they should be
> included in the ethtool register dump and a patch added to ethtool to
> pretty print them. That's what other drivers like e1000e do. For the
> "coredump", devlink health can be used.

So the debugging features are re-written following Benjamin's advice,
   - use ethtool to dump registers
   - dump kernel data structures in drgn
   - use devlink health to do coredump

The get_regs ethtool_ops has already implemented. What lacks is a patch
for the userland ethtool to do the pretty-printing. I haven't yet provided
a patch to the userland ethtool because I'm aware ethtool is moving towards
the netlink interface [2]. I'm curious if a generalized mechanism of
pretty-printing will be implemented thus making pretty-printing for a
specific driver unnecessary. As of this writing, `-d|--register-dump`
hasn't been implemented for the netlink interface.


To dump kernel data structures, the following Python script can be used
in drgn,


    ```python
    def align(x, a):
        """the alignment a should be a power of 2
        """
        mask = a - 1
        return (x+ mask) & ~mask

    def struct_size(struct_type):
        struct_str = "struct {}".format(struct_type)
        return sizeof(Object(prog, struct_str, address=0x0))

    def netdev_priv(netdevice):
        NETDEV_ALIGN = 32
        return netdevice.value_() + align(struct_size("net_device"), NETDEV_ALIGN)

    name = 'xxx'
    qlge_device = None
    netdevices = prog['init_net'].dev_base_head.address_of_()
    for netdevice in list_for_each_entry("struct net_device", netdevices, "dev_list"):
        if netdevice.name.string_().decode('ascii') == name:
            print(netdevice.name)

    ql_adapter = Object(prog, "struct ql_adapter", address=netdev_priv(qlge_device))
    ```

The struct ql_adapter will be printed in drgn as follows,
    >>> ql_adapter
    (struct ql_adapter){
            .ricb = (struct ricb){
                    .base_cq = (u8)0,
                    .flags = (u8)120,
                    .mask = (__le16)26637,
                    .hash_cq_id = (u8 [1024]){ 172, 142, 255, 255 },
                    .ipv6_hash_key = (__le32 [10]){},
                    .ipv4_hash_key = (__le32 [4]){},
            },
            .flags = (unsigned long)0,
            .wol = (u32)0,
            .nic_stats = (struct nic_stats){
                    .tx_pkts = (u64)0,
                    .tx_bytes = (u64)0,
                    .tx_mcast_pkts = (u64)0,
                    .tx_bcast_pkts = (u64)0,
                    .tx_ucast_pkts = (u64)0,
                    .tx_ctl_pkts = (u64)0,
                    .tx_pause_pkts = (u64)0,
                    ...
            },
            .active_vlans = (unsigned long [64]){
                    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 52780853100545, 18446744073709551615,
                    18446619461681283072, 0, 42949673024, 2147483647,
            },
            .rx_ring = (struct rx_ring [17]){
                    {
                            .cqicb = (struct cqicb){
                                    .msix_vect = (u8)0,
                                    .reserved1 = (u8)0,
                                    .reserved2 = (u8)0,
                                    .flags = (u8)0,
                                    .len = (__le16)0,
                                    .rid = (__le16)0,
                                    ...
                            },
                            .cq_base = (void *)0x0,
                            .cq_base_dma = (dma_addr_t)0,
                    }
                    ...
            }
    }


And the coredump obtained via devlink in json format looks like,

    $ devlink health dump show DEVICE reporter coredump -p -j
    {
        "Core Registers": {
            "segment": 1,
            "values": [ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ]
        },
        "Test Logic Regs": {
            "segment": 2,
            "values": [ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ]
        },
        "RMII Registers": {
            "segment": 3,
            "values": [ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ]
        },
        ...
        "Sem Registers": {
            "segment": 50,
            "values": [ 0,0,0,0 ]
        }
    }

Since I don't have a QLGE device and neither could I find a software
simulator, I put some functions into e1000 to get the above result.

I notice with the qlge_force_coredump module parameter set, ethtool
can also get the coredump. I'm not sure which tool is more suitable for
the coredump feature.

[1] https://lkml.org/lkml/2020/6/30/19
[2] https://www.kernel.org/doc/html/latest/networking/ethtool-netlink.html

Coiby Xu (3):
  Initialize devlink health dump framework for the dlge driver
  coredump via devlink health reporter
  clean up code that dump info to dmesg

 drivers/staging/qlge/Makefile       |   2 +-
 drivers/staging/qlge/qlge.h         |  91 +---
 drivers/staging/qlge/qlge_dbg.c     | 672 ----------------------------
 drivers/staging/qlge/qlge_ethtool.c |   1 -
 drivers/staging/qlge/qlge_health.c  | 156 +++++++
 drivers/staging/qlge/qlge_health.h  |   2 +
 drivers/staging/qlge/qlge_main.c    |  27 +-
 7 files changed, 189 insertions(+), 762 deletions(-)
 create mode 100644 drivers/staging/qlge/qlge_health.c
 create mode 100644 drivers/staging/qlge/qlge_health.h

--
2.27.0

Powered by blists - more mailing lists