[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMuHMdVuEnfOkbw2zYXBS+WSZbrkajAPFoYVGFAZBuXK+ac8oA@mail.gmail.com>
Date: Sun, 2 Oct 2022 10:53:07 +0200
From: Geert Uytterhoeven <geert@...ux-m68k.org>
To: "Artem S. Tashkinov" <aros@....com>
Cc: Takashi Iwai <tiwai@...e.de>,
Thorsten Leemhuis <linux@...mhuis.info>,
Konstantin Ryabitsev <konstantin@...uxfoundation.org>,
workflows@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
Greg KH <gregkh@...uxfoundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
"regressions@...ts.linux.dev" <regressions@...ts.linux.dev>,
ksummit@...ts.linux.dev
Subject: Re: Planned changes for bugzilla.kernel.org to reduce the "Bugzilla blues"
Hi Artem,
On Sun, Oct 2, 2022 at 10:23 AM Artem S. Tashkinov <aros@....com> wrote:
> On 10/2/22 07:37, Takashi Iwai wrote:
> > On Sat, 01 Oct 2022 12:30:22 +0200,
> > Artem S. Tashkinov wrote:
> >> Here's another one which is outright puzzling:
> >>
> >> You run: dmesg -t --level=emerg,crit,err
> >>
> >> And you see some non-descript errors of some kernel subsystems seemingly
> >> failing or being unhappy about your hardware. Errors are as cryptic as
> >> humanly possible, you don't even know what part of kernel has produced them.
> >>
> >> OK, as a "power" user I download the kernel source, run `grep -R message
> >> /tmp/linux-5.19` and there are _multiple_ different modules and places
> >> which contain this message.
> >>
> >> I'm lost. Send this to LKML? Did that in the long past, no one cared, I
> >> stopped.
> >>
> >> Here's what I'm getting with Linux 5.19.12:
> >>
> >> platform wdat_wdt: failed to claim resource 5: [mem
> >> 0x00000000-0xffffffff7fffffff]
> >> ACPI: watchdog: Device creation failed: -16
> >> ACPI BIOS Error (bug): Could not resolve symbol
> >> [\_SB.PCI0.XHC.RHUB.TPLD], AE_NOT_FOUND (20220331/psargs-330)
> >> ACPI Error: Aborting method \_SB.UBTC.CR01._PLD due to previous error
> >> (AE_NOT_FOUND) (20220331/psparse-529)
> >> platform MSFT0101:00: failed to claim resource 1: [mem
> >> 0xfed40000-0xfed40fff]
> >> acpi MSFT0101:00: platform device creation failed: -16
> >> lis3lv02d: unknown sensor type 0x0
> >>
> >> Are they serious? Should they be reported or not? Is my laptop properly
> >> working? I have no clue at all.
> >
> > That's a dilemma. The kernel can't know whether it's "properly"
> > working, either -- that is, whether the lack of some functions matters
> > for you or not. In your case above, it's about a watchdog, something
> > related with USB, TPM, and acceleration sensor, all of which likely
> > come from a buggy BIOS. Would you mind if those features are missing?
> > Or even whether your device has a correct hardware implementation?
> > Kernel doesn't know, hence it complains as an error.
> >
> > In many drivers, there are mechanisms to shut off superfluous error
> > messages for known devices. So it's case-by-case solutions.
> >
> > Or you can completely hide those errors at boot by a boot option
> > (e.g. loglevel=2).
>
> The problem is some of such messages are indeed indicative of certain
> real issues which result in HW not working properly, including:
>
> 1) missing/incorrect firmware
> 2) most importantly: not enabled power saving modes
> 3) not enabled high performance modes
> 4) not enabled devices
> 5) not enabled devices' functions
> 6) drivers conflicts (i.e. the wrong module gets loaded for the device)
> 7) physically failing hardware
>
> I'm quite sure you don't really know what half of those messages
> actually mean.
>
> Speaking of 7. Various kernel subsystems/drivers deal with e.g. mass
> storage which is known to fail quite often. There's not a single driver
> in the kernel which is actually brave enough to spew something like this:
>
> "/dev/xxxx might be failing, please RMA or seek help online"
>
> instead you get a dmesg choke full of "unable to read sector XXX" or
> something like that.
>
> To return to the previous errors: it's impossible for the user to assess
> their severity and that sucks. What is "platform device creation
> failed"? What is "unknown sensor type"? What am I missing? Who's
> responsible? The kernel? My HW vendor? Are those errors actionable? In
> my understanding a properly working computer must not produce
> "emerg,crit,err" errors. I'm not even talking about "warn,info" and such.
I am afraid that for most of the above, the kernel cannot know the
answer. Hence more investigation/debugging is needed.
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@...ux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
Powered by blists - more mailing lists