[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240521183913.GGZkzqUSN1hdE-_OEY@fat_crate.local>
Date: Tue, 21 May 2024 20:39:13 +0200
From: Borislav Petkov <bp@...en8.de>
To: Dan Williams <dan.j.williams@...el.com>
Cc: "Fabio M. De Francesco" <fabio.m.de.francesco@...ux.intel.com>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Len Brown <lenb@...nel.org>, Tony Luck <tony.luck@...el.com>,
linux-acpi@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-edac@...r.kernel.org, rostedt@...dmis.org
Subject: Re: [RFC PATCH v2 3/3] ACPI: extlog: Make print_extlog_rcd() log
unconditionally
On Thu, May 16, 2024 at 11:56:15AM -0700, Dan Williams wrote:
> Something like a new annotation for tracepoints marking them as "emit to
> kernel log if no-one is watching this high-priority trace event"?
I'm not sure anymore what we want here now... :)
We want for the kernel to not pay attention to whether userspace is
consuming error info from the tracepoints and to issue errors into the
kernel log too.
So you have them in the kernel log *and* in the tracepoints.
Right?
Or only through the tracepoints?
> Well no, there is little to complain about in that path because that
> path does not play "is anybody watching" games. It always emits to the
> kernel log (subject to rate limiting) and then follows up with always
> emitting a tracepoint (subject to standard trace filtering).
>
> For CXL I asked that its events deprecate the kernel log path with the
> hope of not growing new userspace dependencies on kernel log parsing for
Yeah, kernel log string format is not an ABI so no problem.
> newfangled CXL errors.
So shuffling hw error info solely through the tracepoints is probably ok
but I am unable to rule out *all* possible cases where it would still
make sense to dump to the kernel log, be it there's no userspace, be it
it is a critical error and you want to dump it immediately or whatnot.
It should probably be configurable.
As in: by default all hw error info goes only through tracepoints with
the exception of fatal errors - they go both to the kernel log and
tracepoints.
And then perhaps add "ras=dump_all_error_info_to_kernel_log_too" or so,
cmdline param.
> At least when the firmware gets out of the way Linux has a chance to
> solve user issues.
Yeap.
> Yes, tracepoints feel simple and generic to me while kernel log messages
> with rate-limiting is already a lossy proposition.
Right. Btw, do bear in mind that ratelimiting of hw errors can be bad
sometimes: imagine it ratelimits exactly that one fatal error which you
absolutely should've seen... :-\
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists