lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANEJEGus1+M2ORKB-XaXnL_iyFRRR_bXmShsAg79Hb31+aAWMA@mail.gmail.com>
Date:   Wed, 17 May 2023 22:58:20 -0700
From:   Grant Grundler <grundler@...omium.org>
To:     Bjorn Helgaas <helgaas@...nel.org>
Cc:     Grant Grundler <grundler@...omium.org>,
        Rajat Jain <rajatja@...omium.org>,
        Rajat Khandelwal <rajat.khandelwal@...ux.intel.com>,
        linux-pci@...r.kernel.org,
        Mahesh J Salgaonkar <mahesh@...ux.ibm.com>,
        linux-kernel@...r.kernel.org,
        "Oliver O 'Halloran" <oohall@...il.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        linuxppc-dev@...ts.ozlabs.org
Subject: Re: [PATCHv2 pci-next 2/2] PCI/AER: Rate limit the reporting of the
 correctable errors

On Wed, May 17, 2023 at 9:03 AM Bjorn Helgaas <helgaas@...nel.org> wrote:
>
> On Fri, Apr 07, 2023 at 04:46:03PM -0700, Grant Grundler wrote:
> > On Fri, Apr 7, 2023 at 12:46 PM Bjorn Helgaas <helgaas@...nel.org> wrote:
> > > On Fri, Apr 07, 2023 at 11:53:27AM -0700, Grant Grundler wrote:
> > > > On Thu, Apr 6, 2023 at 12:50 PM Bjorn Helgaas <helgaas@...nel.org>
> > > wrote:
> > > > > On Fri, Mar 17, 2023 at 10:51:09AM -0700, Grant Grundler wrote:
> > > > > > From: Rajat Khandelwal <rajat.khandelwal@...ux.intel.com>
> > > > > >
> > > > > > There are many instances where correctable errors tend to inundate
> > > > > > the message buffer. We observe such instances during thunderbolt PCIe
> > > > > > tunneling.
> > > > ...
> > >
> > > > > >               if (info->severity == AER_CORRECTABLE)
> > > > > > -                     pci_info(dev, "   [%2d] %-22s%s\n", i, errmsg,
> > > > > > -                             info->first_error == i ? " (First)" :
> > > "");
> > > > > > +                     pci_info_ratelimited(dev, "   [%2d]
> > > %-22s%s\n", i, errmsg,
> > > > > > +                                          info->first_error == i ?
> > > " (First)" : "");
> > > > >
> > > > > I don't think this is going to reliably work the way we want.  We have
> > > > > a bunch of pci_info_ratelimited() calls, and each caller has its own
> > > > > ratelimit_state data.  Unless we call pci_info_ratelimited() exactly
> > > > > the same number of times for each error, the ratelimit counters will
> > > > > get out of sync and we'll end up printing fragments from error A mixed
> > > > > with fragments from error B.

Despite consolidating the error output, my impression is this is still
possible. :(

...
> > > Rate-limiting is per call location, so yes, if we only have one call
> > > location, that would solve it.  It would also have the nice property
> > > that all the output would be atomic so it wouldn't get mixed with
> > > other stuff, and it might encourage us to be a little less wordy in
> > > the output.

Unfortunately, I think this needs further surgery.

> > +1 to all of those reasons. Especially reducing the number of lines output.
> >
> > I'm going to be out for the next week. If someone else (Rajat Kendalwal
> > maybe?) wants to rework this to use one call location it should be fairly
> > straight forward. If not, I'll tackle this when I'm back (in 2 weeks
> > essentially).
>
> Ping?  Really hoping to merge this for v6.5.

I've appended what I have now... but there are still two issues:
1) we still end up with two "pci_info_ratelimited" call locations: one
in aer_print_err() and another in __aer_print_err().
2) I just noticed both functions output info->status and info->mask
(so this ends up getting printed twice in different formats).

and that's not really even looking carefully at the other call site:
cper_print_aer()

If this is "good enough" for now, I can repost as v3.

cheers,
grant

View attachment "0001-PCI-AER-correctable-error-message-as-KERN_INFO.patch" of type "text/x-patch" (2935 bytes)

View attachment "0002-PCI-AER-Rate-limit-the-reporting-of-the-correctable-.patch" of type "text/x-patch" (6811 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ