lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CA+sbYW3VdewdCrU+PtvAksXXyi=zgGm6Yk=BHNNfbp1DDjRKcQ@mail.gmail.com>
Date: Mon, 24 Feb 2025 14:30:04 +0530
From: Selvin Xavier <selvin.xavier@...adcom.com>
To: Leon Romanovsky <leon@...nel.org>
Cc: jgg@...pe.ca, linux-rdma@...r.kernel.org, andrew.gospodarek@...adcom.com, 
	kalesh-anakkur.purayil@...adcom.com, netdev@...r.kernel.org, 
	davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org, abeni@...hat.com, 
	horms@...nel.org, michael.chan@...adcom.com
Subject: Re: [PATCH rdma-next 0/9] RDMA/bnxt_re: Driver Debug Enhancements

On Sun, Feb 23, 2025 at 7:05 PM Leon Romanovsky <leon@...nel.org> wrote:
>
> On Thu, Feb 20, 2025 at 10:34:47AM -0800, Selvin Xavier wrote:
> > For debugging issues in the field, we need to track some of
> > the resources destroyed in the past. This is primarily required
> > for tracking certain QPs that encountered errors, leading to
> > application exits. A framework has been implemented to
> > save this information and retrieve it during coredump collection.
> >
> > The Broadcom bnxt L2 driver supports collecting driver dumps
> > using the ethtool -w option. This feature now also supports
> > collecting coredump information from the bnxt_re auxiliary driver.
> > Two new callbacks have been implemented to exchange dump
> > information supported by the auxbus bnxt_re driver.
> >
> > The bnxt_re driver caches certain hardware information before
> > resources are destroyed in the HW.
>
> Unfortunately, no. The idea that you will cache kernel objects and they
> live beyond their HW counterpart doesn't fit RDMA object model.
Since the scale of the resources are in thousands usually, we can not dump
the debug information to the system logs. So we are not having much context of
the failure and this is the reason for having this new mechanism.
>
> I'm aware that you are not keeping objects itself, but their shadow
> copy. So if you want, your FW can store these failed objects and you
> will retrieve them through existing netdev side (ethtool -w ...).
FW doesn't have enough memory to backup this info. It needs to
be backed up in the host memory and FW has to write it to host memory
when an error happens. This is possible in some newer FW versions.
But itt is not just the HW context that we are caching here. We need to backup
some host side driver/lib info also to correlate with the HW context.
We have been debugging issues like this using our Out of box driver
and we find it useful to get the context
of failure. Some of the internal tools can decode this information and
we want to
have the same behavior between inbox and Out of Box driver.

>
> Thanks

Download attachment "smime.p7s" of type "application/pkcs7-signature" (4224 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ