netdev - Re: qed*: debug infrastructures

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170425075117.GB1867@nanopsycho.orion>
Date:   Tue, 25 Apr 2017 09:51:17 +0200
From:   Jiri Pirko <jiri@...nulli.us>
To:     Jakub Kicinski <kubakici@...pl>
Cc:     "Elior, Ariel" <Ariel.Elior@...ium.com>,
        David Miller <davem@...emloft.net>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "Mintz, Yuval" <Yuval.Mintz@...ium.com>,
        "Tayar, Tomer" <Tomer.Tayar@...ium.com>,
        "Dupuis, Chad" <Chad.Dupuis@...ium.com>
Subject: Re: qed*: debug infrastructures

Tue, Apr 25, 2017 at 05:44:10AM CEST, kubakici@...pl wrote:
>On Mon, 24 Apr 2017 17:38:57 +0000, Elior, Ariel wrote:
>> Hi Dave,
>
>Hi Ariel!
>
>I'm not Dave but let me share my perspective :)
>
>> According to the recent messages on the list indicating debugfs is not the way
>> to go, I am looking for some guidance on what is. dpipe approach was
>> mentioned as favorable, but I wanted to make sure molding our debug features to
>> this infrastructure will result in something acceptable. A few points:
>> 
>> 1.
>> One of our HW debug features is a signal recording feature. HW is configured to
>> output specific signals, which are then continuously dumped into a cyclic
>> buffer on host. There are ~8000 signals, which can be logically divided to ~100
>> groups. I believe this can be modeled in dpipe (or similar tool) as a set of
>> ~100 tables with ~80 entries each, and user would be able to see them all and
>> choose what they like. The output data itself is binary, and meaningless to
>> "the user". The amount of data is basically as large a contiguous buffer as
>> driver can allocate, i.e. usually 4MB. When user selects the signals, and sets
>> meta data regarding to mode of operations, some device configuration will have
>> to take place. Does that sound reasonable?
>> This debug feature already exists out of tree for bnx2x and qed* drivers and is
>> *very* effective in field deployments. I would very much like to see this as an
>> in-tree feature via some infrastructure or another.
>
>Sorry for even mentioning it, new debug interfaces would be cool, but
>for FW/HW state dumps which are meaningless to the user why not just
>use ethtool get-dump/set-dump?  Do you really need the ability to
>toggle those 8k signals one-by-one or are there reasonable sets you
>could provide from the driver that you could encode on the available
>32bits of flags?
>
>What could be useful would be some form of start/stop commands for
>debugging to tell the driver/FW when to record events selected by
>set-dump and maybe a way for the user to discover what dumps the driver
>can provide (a'la ethtool private flags).
>
>> 2.
>> Sometimes we want to debug the probe or removal flow of the driver. ethtool has
>> the disadvantage of only being available once network device is available. If a
>> bug stops the load flow before the ethtool debug paths are available, there is
>> no way to collect a dump. Similarly, removal flows which hit a bug but do remove
>> the network device, can't be debugged from ethtool. Does dpipe suffer from the
>> same problem? qed* (like mlx*) has a common-functionality module. This allows
>> creating debugfs nodes even before the network drivers are probed, providing a
>> solution for this (debug nodes are also available after network driver removal).
>> If dpipe does hold an answer here (e.g. provide preconfiguration which would
>> kick in when network device registers) then we might want to port all of our
>> register dump logic over there for this advantage. Does that sound reasonable?
>
>Porting the debug/dump infrastructure to devlink would be very much
>appreciated.  I'm not sure it would fit into dpipe or be a separate
>command.

Yeah. dpipe was designed to provide HW pipeline abstraction, based on
match/action model. I think that for stats and debugging, it would make
sense to introduce another devlink object.


>
>> 3.
>> Yuval mentioned this, but I wanted to reiterate that the same is necessary for
>> our storage drivers (qedi/qedf). debugfs does have the advantage of being non
>> sub-system specific. Is there perhaps another non subsystem specific debug
>> infrastructure which *is* acceptable to the networking subsystem? My guess is
>> that the storage drivers will turn to debugfs (in their own subsystem).
>
>devlink is not ethernet-specific, it should be a good fit for iSCSI and
>FCOE drivers, too.

Yes. During the devlink design, I had the non-network usecase in mind.
Devlink is the iface to be used for all who need it.