lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8286502.jJDZkT8p0M@fdefranc-mobl3>
Date: Wed, 23 Oct 2024 15:35:18 +0200
From: "Fabio M. De Francesco" <fabio.m.de.francesco@...ux.intel.com>
To: linux-kernel@...r.kernel.org, Dan Williams <dan.j.williams@...el.com>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>, Len Brown <lenb@...nel.org>,
 Mahesh J Salgaonkar <mahesh@...ux.ibm.com>,
 Oliver O'Halloran <oohall@...il.com>, Bjorn Helgaas <bhelgaas@...gle.com>,
 linux-acpi@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
 linux-pci@...r.kernel.org, Dan Williams <dan.j.williams@...el.com>
Subject: Re: [PATCH 2/2] ACPI: extlog: Trace CPER PCI Express Error Section

On Tuesday, August 6, 2024 9:56:24 PM GMT+2 Dan Williams wrote:
> Fabio M. De Francesco wrote:
> > Currently, extlog_print() (ELOG) only reports CPER PCIe section (UEFI
> > v2.10, Appendix N.2.7) to the kernel log via print_extlog_rcd().
> 
> I think the critical detail is is that print_extlog_rcd() is only
> triggered when ras_userspace_consumers() returns true. The observation
> is that ras_userspace_consumers() hides information from the trace path
> when the intended purpose of it was to hide duplicate emissions to the
> kernel log when userspace is watching the tracepoints.
>
> Setting aside whether ras_userspace_consumers() is still a good idea or
> not, it is obvious that this patch as is may surprise environments that
> start seeing kernel error logs where the kernel was silent before.
>
> I think the path of least surprise would be to make sure that
> pci_print_aer() optionally skips emitting to the kernel log when not
> needed wanted.

Sorry for replying so late...

I'm not entirely sure that users would not prefer to be surprised by 
_finally_ seeing kernel error logs for failing PCIe components. I suspect 
that users might have been confused by not seeing any output.
 
> So perhaps first do a lead-in patch to optionally quiet the print
> messages in pci_print_aer() and then pass in KERN_DEBUG from the
> extlog_print() path. Then we can decide later what to do about
> ras_userspace_consumers().

Anyway, I'll do it.

> > the similar ghes_do_proc() (GHES) prints to kernel log and calls
> > pci_print_aer() to report via the ftrace infrastructure.
> > 
> > Add support to report the CPER PCIe Error section also via the ftrace
> > infrastructure by calling pci_print_aer() to make ELOG act consistently
> > with GHES.
> 
> You might also want to explain a bit about the motivation for this which
> is that I/O Machine Check Arcitecture events may signal failing PCIe
> components or links. The AER event contains details on what was
> happening on the wire when the error was signaled.

Yes, I agree.

> > 
> > Cc: Dan Williams <dan.j.williams@...el.com>
> > Signed-off-by: Fabio M. De Francesco 
<fabio.m.de.francesco@...ux.intel.com>
> > ---
> >  drivers/acpi/acpi_extlog.c | 30 ++++++++++++++++++++++++++++++
> >  drivers/pci/pcie/aer.c     |  2 +-
> >  include/linux/aer.h        | 13 +++++++++++--
> >  3 files changed, 42 insertions(+), 3 deletions(-)
> > 
> > [...]
> >
> > +		pci_print_aer(pdev, aer_severity, aer);
> 
> ...per above this would become:
> 
>     pci_print_aer(KERN_DEBUG, pdev, aer_severity, aer);
> 
> [..]
> 
> Rest of the changes look good to me.

I need to be sure that I understood...

void pci_print_aer(char *level, struct pci_dev *dev, int aer_severity,
                   struct aer_capability_regs *aer)
{
        [...]

        if (printk_get_level(level) <= console_loglevel) {
                pci_err(dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n",
                        status, mask);
                __aer_print_error(dev, &info);
                pci_err(dev, "aer_layer=%s, aer_agent=%s\n",
                        aer_error_layer[layer], aer_agent_string[agent]);

                if (aer_severity != AER_CORRECTABLE)
                        pci_err(dev, "aer_uncor_severity: 0x%08x\n",
                                aer->uncor_severity);

                if (tlp_header_valid)
                        __print_tlp_header(dev, &aer->header_log);
        }

        [...]
}	

It would require changing a couple of call sites, like in    
aer_recover_work_func():

pci_print_aer(KERN_ERR, pdev, entry.severity, entry.regs);
 
Would you please confirm that the code shown above is what
you asked for?

Thanks,

Fabio



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ