linux-kernel - Re: Linux 6.8-rc6

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6bb3f88b-bf57-442a-8b46-cb4784dd4cab@roeck-us.net>
Date: Mon, 26 Feb 2024 09:51:49 -0800
From: Guenter Roeck <linux@...ck-us.net>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 6.8-rc6

On Sun, Feb 25, 2024 at 03:57:21PM -0800, Linus Torvalds wrote:
> Another week, another rc. Nothing here really stands out.
> 
> Last week I said that I was hoping things would calm down a bit.
> Technically things did calm down a bit, and rc6 is smaller than rc5
> was. But not by a huge amount, and honestly, while there's nothing
> really alarming here, there's more here than I would really like at
> this point in the release.
> 
> So this may end up being one of those releases that get an rc8. We'll
> see. The fact that we have a bit more commits than I would really wish
> for might not be a huge issue when a noticeable portion of said
> commits end up being about self-tests etc.
> 

Good to get those unit test failures to pass, though.

Build results:
	total: 155 pass: 155 fail: 0
Qemu test results:
	total: 549 pass: 548 fail: 1
Unit test results:
	pass: 170476 fail: 620

Details below.

Guenter

============

Runtime crashes
===============

an385:mps2_defconfig:mps2-an385:initrd
--------------------------------------

an385 does not support unaligned accesses, but test_ip_fast_csum
expects it.

Fix:
https://lore.kernel.org/lkml/20240207-fix_sparse_errors_checksum_tests-v6-0-4caa9629705b@rivosinc.com/

See additional information below about checksum unit test failures.

Warning backtraces
==================

WARNING: inconsistent lock state
6.8.0-rc4 #1 Tainted: G                 N
--------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
kworker/0:2/39 [HC1[1]:SC0[2]:HE0:SE0] takes:
ef792074 (&syncp->seq#2){?...}-{0:0}, at: sun8i_dwmac_dma_interrupt+0x9c/0x28c
{HARDIRQ-ON-W} state was registered at: 
  lock_acquire+0x11c/0x368
  __u64_stats_update_begin+0x104/0x1ac
  stmmac_xmit+0x4d0/0xc58
  dev_hard_start_xmit+0xc4/0x2a0
  sch_direct_xmit+0xf8/0x30c
  __dev_queue_xmit+0x400/0xcc4
  ip6_finish_output2+0x254/0xafc
  mld_sendpack+0x260/0x5b0
  mld_ifc_work+0x274/0x588
  process_one_work+0x230/0x604
  worker_thread+0x1dc/0x494
  kthread+0x100/0x120
  ret_from_fork+0x14/0x28

Caused by commit 38cc3c6dcc09 ("net: stmmac: protect updates of 64-bit
statistics counters.")

Report:
https://lore.kernel.org/lkml/ea1567d9-ce66-45e6-8168-ac40a47d1821@roeck-us.net/

No activity or even agreement if this is a false positive or a real problem.
I added it to the regression tracker (or at least tried to) since the problem
has now proliferated into stable branches, and the patch causing the backtrace
may even be marked as CVE according to:
https://git.kernel.org/pub/scm/linux/security/vulns.git/tree/cve/review/proposed/v6.7.6-greg

Interesting question is if the fix for this presumed CVE is causing another
security issue.


Unit test failures
==================

checksum
--------

Various checksum tests fail on several machines, with different reasons.
Too many to list in detail.

Reports:

https://lore.kernel.org/lkml/ec44bf32-8b66-40c4-bc62-4deed3702f99@roeck-us.net/
https://lore.kernel.org/lkml/9b004c45-45f8-4abb-a24e-bb47b369b1a5@roeck-us.net/
https://lore.kernel.org/lkml/65ed7c95-712c-410b-84f3-58496b0c9649@roeck-us.net/

Suggested fixes:

https://lore.kernel.org/lkml/20240207-fix_sparse_errors_checksum_tests-v6-0-4caa9629705b@rivosinc.com/
https://lore.kernel.org/lkml/20240210175526.3710522-1-linux@roeck-us.net/
https://lore.kernel.org/lkml/20240211160837.2436375-1-linux@roeck-us.net/
https://lore.kernel.org/lkml/20240210191556.3761064-1-linux@roeck-us.net/

Most fixes are queued in -next. There is still an open question if the
tested functions are supposed to work on unaligned addresses.

stackinit
---------

Seen with m68k:q800 emulation.

    # test_char_array_zero: ASSERTION FAILED at lib/stackinit_kunit.c:333
    Expected stackinit_range_contains(fill_start, fill_size, target_start, target_size) to be true, but is false
stack fill missed target!? (fill 16 wide, target offset by -12)

    # test_char_array_none: ASSERTION FAILED at lib/stackinit_kunit.c:343
    Expected stackinit_range_contains(fill_start, fill_size, target_start, target_size) to be true, but is false
stack fill missed target!? (fill 16 wide, target offset by -12)

Report:
https://lore.kernel.org/lkml/a0d10d50-2720-4ecd-a2c6-c2c5e5aeee65@roeck-us.net/

I suspect this may be caused by the test assuming that stack growth is
downward, but I don't really understand the test well enough to be sure.
I'll disable this set of tests for m68k going forward, so I am not going
to report the problem again in the future.

mean and variance tests
-----------------------

This is bcachefs related.

test cases 2 and 4 fail on all architectures/branches, and I don't see how
those tests can pass. The functionality was promoted into lib/math/
in -next, and the test now always fails there even if bcachefs is not
enabled.

Not added to regression tracker since I am not sure if unit test problems
caused by bad test cases should count as regressions.

Report:
https://lore.kernel.org/lkml/065b94eb-6a24-4248-b7d7-d3212efb4787@roeck-us.net/

Suggested fix:
https://lore.kernel.org/lkml/20240225162925.1708462-1-linux@roeck-us.net/