netdev - qed*: debug infrastructures

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CY1PR0701MB1337FA26F1503FC9928B0340901F0@CY1PR0701MB1337.namprd07.prod.outlook.com>
Date:   Mon, 24 Apr 2017 17:38:57 +0000
From:   "Elior, Ariel" <Ariel.Elior@...ium.com>
To:     David Miller <davem@...emloft.net>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "Mintz, Yuval" <Yuval.Mintz@...ium.com>,
        "Tayar, Tomer" <Tomer.Tayar@...ium.com>,
        "Dupuis, Chad" <Chad.Dupuis@...ium.com>
Subject: qed*: debug infrastructures

Hi Dave,

According to the recent messages on the list indicating debugfs is not the way
to go, I am looking for some guidance on what is. dpipe approach was
mentioned as favorable, but I wanted to make sure molding our debug features to
this infrastructure will result in something acceptable. A few points:

1.
One of our HW debug features is a signal recording feature. HW is configured to
output specific signals, which are then continuously dumped into a cyclic
buffer on host. There are ~8000 signals, which can be logically divided to ~100
groups. I believe this can be modeled in dpipe (or similar tool) as a set of
~100 tables with ~80 entries each, and user would be able to see them all and
choose what they like. The output data itself is binary, and meaningless to
"the user". The amount of data is basically as large a contiguous buffer as
driver can allocate, i.e. usually 4MB. When user selects the signals, and sets
meta data regarding to mode of operations, some device configuration will have
to take place. Does that sound reasonable?
This debug feature already exists out of tree for bnx2x and qed* drivers and is
*very* effective in field deployments. I would very much like to see this as an
in-tree feature via some infrastructure or another.

2.
Sometimes we want to debug the probe or removal flow of the driver. ethtool has
the disadvantage of only being available once network device is available. If a
bug stops the load flow before the ethtool debug paths are available, there is
no way to collect a dump. Similarly, removal flows which hit a bug but do remove
the network device, can't be debugged from ethtool. Does dpipe suffer from the
same problem? qed* (like mlx*) has a common-functionality module. This allows
creating debugfs nodes even before the network drivers are probed, providing a
solution for this (debug nodes are also available after network driver removal).
If dpipe does hold an answer here (e.g. provide preconfiguration which would
kick in when network device registers) then we might want to port all of our
register dump logic over there for this advantage. Does that sound reasonable?

3.
Yuval mentioned this, but I wanted to reiterate that the same is necessary for
our storage drivers (qedi/qedf). debugfs does have the advantage of being non
sub-system specific. Is there perhaps another non subsystem specific debug
infrastructure which *is* acceptable to the networking subsystem? My guess is
that the storage drivers will turn to debugfs (in their own subsystem).

Thanks,
Ariel