linux-kernel - Re: Linux 6.8-rc2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <39472f63-bb70-4ae6-b9cc-a95eee4c781d@roeck-us.net>
Date: Mon, 29 Jan 2024 11:39:09 -0800
From: Guenter Roeck <linux@...ck-us.net>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 6.8-rc2

On Sun, Jan 28, 2024 at 05:13:03PM -0800, Linus Torvalds wrote:
> So we had a number of small annoying issues in rc1, including an
> amdgpu scheduling bug that could cause a hung desktop (that would
> *eventually* recover, but after a long enough timeout that most people
> probably ended up rebooting instead. That one seems to have hit a fair
> number of people.
> 
> There was also a btrfs bug wrt zstd-compressed inline extents,
> although (somewhat) happily that wasn't in rc1 and got noticed and
> reverted fairly quickly, so hopefully it didn't hit very many people.
> It did me.
> 
> Anyway, I hope that with rc2, we're now in the more stable part of the
> release cycle, with those kinds of problems that might affect a lot of
> testers sorted out. So hopefully the fixes will be more subtle and not
> affect common core setups.
> 
> So go out and test. It's safe now. You trust me, right?
> 

Build results:
	total: 155 pass: 155 fail: 0
Qemu test results:
	total: 549 pass: 548 fail: 1
Failed tests:
	arm:mps2-an385:mps2_defconfig:mps2-an385:initrd

Caveats:
- I disabled CONFIG_WERROR for alpha, openrisc, sh, and sparc64 builds.
  This is because commit 0fcb70851fbf ("Makefile.extrawarn: turn on
  missing-prototypes globally") causes test builds on those architectures
  to fail if CONFIG_WERROR is enabled, and I really don't want to act as
  missing-prototypes police.

- I disabled CONFIG_FRAME_WARN entirely.
  The warning was just getting annoying, to a large part because people
  just keep adding functions with large stack frames. On top of that,
  the warning very much depends very much on the compiler and compiler
  version. Finally, most of the "fixes" I have seen over the years don't
  really solve the problem but just split affected functions into multiple
  sub-functions, with the overall stack frame being just as large or
  even larger than before. In my opinion that defeats the purpose of the
  warning, making it useless.

The mps2-an385 boot failure is due to commit 6f4c45cbcb00 ("kunit: Add
tests for csum_ipv6_magic and ip_fast_csum") which is buggy. Oddly enough,
I have only seen it with my mps2-an385 (arm nommu) boot test. A fix is
available at
https://lore.kernel.org/lkml/20240124-fix_sparse_errors_checksum_tests-v4-0-bc2b8d23a35c@rivosinc.com/

There is a new warning seen in various boot tests:

BUG: sleeping function called from invalid context at drivers/gpio/gpiolib.c:3749

This is exposed by commit 5d5dfc50e5689 ("gpiolib: remove extra_checks"),
which unconditionally enables the check. The underlying problem is that
sdhci_check_ro() disables interrupts but then (directly or indirectly)
calls mmc_gpio_get_ro() which calls gpiod_get_value_cansleep(). I am not
aware of a pending fix or how a fix should look like. Obviously, commit
5d5dfc50e5689 should not be reverted since it only exposes the problem
and did not cause it. Related discussion is at
https://lore.kernel.org/lkml/19dca2a9-36e1-4a6b-9b65-db4c0a163d56@roeck-us.net/

On top of that, there is at least one selftest failure.

    Expected handshake_req_destroy_test == req, but
        handshake_req_destroy_test == 00000000
        req == c3300da0
    not ok 11 req_destroy works
# Handshake API tests: pass:10 fail:1 skip:0 total:11

My system is not (yet) set up to track such failues (I only happened to
notice when browsing through logs), so I don't know if this is the only
selftest failure. I do see this in v6.6.y and v6.7.y, so it is not a
new problem. I don't know (and didn't check) if anyone is aware of it.

Guenter