lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 25 Nov 2021 17:46:31 +0100
From:   Stefan Assmann <sassmann@...hat.com>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     Tony Nguyen <anthony.l.nguyen@...el.com>, davem@...emloft.net,
        Jedrzej Jagielski <jedrzej.jagielski@...el.com>,
        netdev@...r.kernel.org,
        Arkadiusz Kubalewski <arkadiusz.kubalewski@...el.com>,
        Konrad Jankowski <konrad0.jankowski@...el.com>
Subject: Re: [PATCH net-next 06/12] iavf: Add trace while removing device

On 2021-11-25 07:57, Jakub Kicinski wrote:
> On Thu, 25 Nov 2021 16:43:49 +0100 Stefan Assmann wrote:
> > On 2021-11-25 07:13, Jakub Kicinski wrote:
> > > On Thu, 25 Nov 2021 07:50:49 +0100 Stefan Assmann wrote:  
> > > > From personal experience I'd say this piece of information has value,
> > > > especially when debugging it can be interesting to know exactly when
> > > > the driver was removed.  
> > > 
> > > But there isn't anything specific to iavf here, right? If it really 
> > > is important then core should be doing the printing for all drivers.
> > > 
> > > Actually, I can't come up with any uses for this print on the spot.
> > > What debugging scenarios do you have in mind?  
> > 
> > There was a lot of trouble with iavf in terms of device reset, device
> > unbinding (DPDK), stress testing of driver load/unload issues. When
> > looking through the crash logs it was not always easy to determine if
> > the driver was still loaded.
> > Especially on problems that weren't easy to reproduce.
> 
> That's a slippery slope, historically we were always pushing for
> avoiding informational prints. Otherwise every driver reconfig would
> result in a line in the logs.
> 
> > So for iavf having that information would have been valuable. Not sure
> > if that justifies a PCI core message or if others might find that too
> > verbose.
> 
> So what you're saying is from your experience iavf is of significantly
> poorer quality than other vendors' drivers?

No, I can't really comment on how iavf compares to other vendor drivers
since I've mostly worked with Intel. There's a lot of async tasks and
communication going on between PF and VF and while that seems great it
has a lot of potential for things to go wrong as well. If you look at
the git history of iavf (previously i40evf) you'll see that the
especially the reset, shutdown, error handling code has seen a lot of
fixes and refactoring. Just recently in net-next
898ef1cb1cb2 iavf: Combine init and watchdog state machines
45eebd62999d iavf: Refactor iavf state machine tracking

Finding and fixing all those corner cases is hard, especially when we
have situation where we rely on firmware to perform a certain task in
defined time.
Example:
In iavf_reset_task() we poll a register for a certain amount of time as
the firmware is supposed to complete the reset in that amount of time.
Now that works well 99,999% of the time but sometime it doesn't so the
driver logs a message and continues operation.
That might work, it might not and things go wrong and we had to figure
out why.
Or some async message came in at a bad time, while the driver was
already being removed, and rescheduled the reset task which of course
caused a crash as the structures were being free'd underneath the reset
task.

So from the perspective of somebody working on the driver I'd like to
see it when the driver gets removed.

  Stefan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ