lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 2 Oct 2022 08:23:07 +0000
From:   "Artem S. Tashkinov" <aros@....com>
To:     Takashi Iwai <tiwai@...e.de>
Cc:     Thorsten Leemhuis <linux@...mhuis.info>,
        Konstantin Ryabitsev <konstantin@...uxfoundation.org>,
        workflows@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
        Greg KH <gregkh@...uxfoundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        "regressions@...ts.linux.dev" <regressions@...ts.linux.dev>,
        ksummit@...ts.linux.dev
Subject: Re: Planned changes for bugzilla.kernel.org to reduce the "Bugzilla
 blues"



On 10/2/22 07:37, Takashi Iwai wrote:
> On Sat, 01 Oct 2022 12:30:22 +0200,
> Artem S. Tashkinov wrote:
>> - 2 -
>>
>> Here's another one which is outright puzzling:
>>
>> You run: dmesg -t --level=emerg,crit,err
>>
>> And you see some non-descript errors of some kernel subsystems seemingly
>> failing or being unhappy about your hardware. Errors are as cryptic as
>> humanly possible, you don't even know what part of kernel has produced them.
>>
>> OK, as a "power" user I download the kernel source, run `grep -R message
>> /tmp/linux-5.19` and there are _multiple_ different modules and places
>> which contain this message.
>>
>> I'm lost. Send this to LKML? Did that in the long past, no one cared, I
>> stopped.
>>
>> Here's what I'm getting with Linux 5.19.12:
>>
>> platform wdat_wdt: failed to claim resource 5: [mem
>> 0x00000000-0xffffffff7fffffff]
>> ACPI: watchdog: Device creation failed: -16
>> ACPI BIOS Error (bug): Could not resolve symbol
>> [\_SB.PCI0.XHC.RHUB.TPLD], AE_NOT_FOUND (20220331/psargs-330)
>> ACPI Error: Aborting method \_SB.UBTC.CR01._PLD due to previous error
>> (AE_NOT_FOUND) (20220331/psparse-529)
>> platform MSFT0101:00: failed to claim resource 1: [mem
>> 0xfed40000-0xfed40fff]
>> acpi MSFT0101:00: platform device creation failed: -16
>> lis3lv02d: unknown sensor type 0x0
>>
>> Are they serious? Should they be reported or not? Is my laptop properly
>> working? I have no clue at all.
>
> That's a dilemma.  The kernel can't know whether it's "properly"
> working, either -- that is, whether the lack of some functions matters
> for you or not.  In your case above, it's about a watchdog, something
> related with USB, TPM, and acceleration sensor, all of which likely
> come from a buggy BIOS.  Would you mind if those features are missing?
> Or even whether your device has a correct hardware implementation?
> Kernel doesn't know, hence it complains as an error.
>
> In many drivers, there are mechanisms to shut off superfluous error
> messages for known devices.  So it's case-by-case solutions.
>
> Or you can completely hide those errors at boot by a boot option
> (e.g. loglevel=2).

The problem is some of such messages are indeed indicative of certain
real issues which result in HW not working properly, including:

1) missing/incorrect firmware
2) most importantly: not enabled power saving modes
3) not enabled high performance modes
4) not enabled devices
5) not enabled devices' functions
6) drivers conflicts (i.e. the wrong module gets loaded for the device)
7) physically failing hardware

I'm quite sure you don't really know what half of those messages
actually mean.

Speaking of 7. Various kernel subsystems/drivers deal with e.g. mass
storage which is known to fail quite often. There's not a single driver
in the kernel which is actually brave enough to spew something like this:

"/dev/xxxx might be failing, please RMA or seek help online"

instead you get a dmesg choke full of "unable to read sector XXX" or
something like that.

To return to the previous errors: it's impossible for the user to assess
their severity and that sucks. What is "platform device creation
failed"? What is "unknown sensor type"? What am I missing? Who's
responsible? The kernel? My HW vendor? Are those errors actionable? In
my understanding a properly working computer must not produce
"emerg,crit,err" errors. I'm not even talking about "warn,info" and such.

Best regards,
Artem

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ