[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6758f013b5459_10a0832941e@dwillia2-xfh.jf.intel.com.notmuch>
Date: Tue, 10 Dec 2024 17:51:15 -0800
From: Dan Williams <dan.j.williams@...el.com>
To: "Fabio M. De Francesco" <fabio.m.de.francesco@...ux.intel.com>,
<linux-kernel@...r.kernel.org>, Dan Williams <dan.j.williams@...el.com>
CC: "Rafael J. Wysocki" <rafael@...nel.org>, Len Brown <lenb@...nel.org>,
Mahesh J Salgaonkar <mahesh@...ux.ibm.com>, Oliver O'Halloran
<oohall@...il.com>, Bjorn Helgaas <bhelgaas@...gle.com>,
<linux-acpi@...r.kernel.org>, <linuxppc-dev@...ts.ozlabs.org>,
<linux-pci@...r.kernel.org>, Dan Williams <dan.j.williams@...el.com>
Subject: Re: [PATCH 2/2] ACPI: extlog: Trace CPER PCI Express Error Section
Fabio M. De Francesco wrote:
> On Tuesday, August 6, 2024 9:56:24 PM GMT+2 Dan Williams wrote:
> > Fabio M. De Francesco wrote:
> > > Currently, extlog_print() (ELOG) only reports CPER PCIe section (UEFI
> > > v2.10, Appendix N.2.7) to the kernel log via print_extlog_rcd().
> >
> > I think the critical detail is is that print_extlog_rcd() is only
> > triggered when ras_userspace_consumers() returns true. The observation
> > is that ras_userspace_consumers() hides information from the trace path
> > when the intended purpose of it was to hide duplicate emissions to the
> > kernel log when userspace is watching the tracepoints.
> >
> > Setting aside whether ras_userspace_consumers() is still a good idea or
> > not, it is obvious that this patch as is may surprise environments that
> > start seeing kernel error logs where the kernel was silent before.
> >
> > I think the path of least surprise would be to make sure that
> > pci_print_aer() optionally skips emitting to the kernel log when not
> > needed wanted.
>
> Sorry for replying so late...
>
> I'm not entirely sure that users would not prefer to be surprised by
> _finally_ seeing kernel error logs for failing PCIe components. I suspect
> that users might have been confused by not seeing any output.
2 notes:
* New KERN_ERR prints are often found to be unwelcome. When the kernel starts
printing new error messages it causes sysadmins to scramble.
* The future of RAS is trace-events. Any new RAS messages to the kernel
log need to ask the question, "is userspace better served by
registering for a RAS trace event, rather than parsing kernel log
messsages".
[..]
> I need to be sure that I understood...
>
> void pci_print_aer(char *level, struct pci_dev *dev, int aer_severity,
> struct aer_capability_regs *aer)
> {
> [...]
>
> if (printk_get_level(level) <= console_loglevel) {
> pci_err(dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n",
> status, mask);
No, the code would be:
pci_printk(level, dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n", status, mask);
...i.e. just pass @level rather than open code "if
(printk_get_level(level) <= console_loglevel)".
Powered by blists - more mailing lists