lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c771e3de-b945-49cd-b078-762164d6ac5d@linux.intel.com>
Date: Tue, 20 May 2025 11:42:10 -0700
From: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@...ux.intel.com>
To: Bjorn Helgaas <helgaas@...nel.org>
Cc: linux-pci@...r.kernel.org, Jon Pan-Doh <pandoh@...gle.com>,
 Karolina Stolarek <karolina.stolarek@...cle.com>,
 Martin Petersen <martin.petersen@...cle.com>,
 Ben Fuller <ben.fuller@...cle.com>, Drew Walton <drewwalton@...rosoft.com>,
 Anil Agrawal <anilagrawal@...a.com>, Tony Luck <tony.luck@...el.com>,
 Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>,
 Lukas Wunner <lukas@...ner.de>,
 Jonathan Cameron <Jonathan.Cameron@...wei.com>,
 Sargun Dhillon <sargun@...a.com>, "Paul E . McKenney" <paulmck@...nel.org>,
 Mahesh J Salgaonkar <mahesh@...ux.ibm.com>,
 Oliver O'Halloran <oohall@...il.com>, Kai-Heng Feng <kaihengf@...dia.com>,
 Keith Busch <kbusch@...nel.org>, Robert Richter <rrichter@....com>,
 Terry Bowman <terry.bowman@....com>, Shiju Jose <shiju.jose@...wei.com>,
 Dave Jiang <dave.jiang@...el.com>, linux-kernel@...r.kernel.org,
 linuxppc-dev@...ts.ozlabs.org, Bjorn Helgaas <bhelgaas@...gle.com>
Subject: Re: [PATCH v6 14/16] PCI/AER: Introduce ratelimit for error logs


On 5/20/25 11:31 AM, Bjorn Helgaas wrote:
> On Mon, May 19, 2025 at 09:59:29PM -0700, Sathyanarayanan Kuppuswamy wrote:
>> On 5/19/25 2:35 PM, Bjorn Helgaas wrote:
>>> From: Jon Pan-Doh <pandoh@...gle.com>
>>>
>>> Spammy devices can flood kernel logs with AER errors and slow/stall
>>> execution. Add per-device ratelimits for AER correctable and uncorrectable
>>> errors that use the kernel defaults (10 per 5s).
>>>
>>> There are two AER logging entry points:
>>>
>>>     - aer_print_error() is used by DPC and native AER
>>>
>>>     - pci_print_aer() is used by GHES and CXL
>>>
>>> The native AER aer_print_error() case includes a loop that may log details
>>> from multiple devices.  This is ratelimited by the union of ratelimits for
>>> these devices, set by add_error_device(), which collects the devices.  If
>>> no such device is found, the Error Source message is ratelimited by the
>>> Root Port or RCEC that received the ERR_* message.
>>>
>>> The DPC aer_print_error() case is currently not ratelimited.
>> Can we also not rate limit fatal errors in AER driver?
> In other words, only rate limit AER_CORRECTABLE and AER_NONFATAL for
> AER?  Seems plausible to me.
Yes, we might lose important information by rate-limiting FATAL errors. I
believe FATAL errors should be infrequent, so it's reasonable to allow them
through without rate limiting. Once you make this change, please also
update the related SysFS documentation and update code accordingly.

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ