lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADyTPExpEqaJiMGoV+Z6xVgL50ZoMJg49B10LcZ=8eg19u34BA@mail.gmail.com>
Date:   Sat, 28 Jan 2023 21:17:31 -0500
From:   Nick Bowler <nbowler@...conx.ca>
To:     linux-kernel@...r.kernel.org, sparclinux@...r.kernel.org,
        regressions@...ts.linux.dev
Cc:     Peter Xu <peterx@...hat.com>
Subject: PROBLEM: sparc64 random crashes starting w/ Linux 6.1 (regression)

Hi,

Starting with Linux 6.1.y, my sparc64 (Sun Ultra 60) system is very
unstable, with userspace processes randomly crashing with all kinds of
different weird errors.  The same problem occurs on 6.2-rc5.  Linux
6.0.y is OK.

Usually, it manifests with ssh connections just suddenly dropping out
like this:

  malloc(): unaligned tcache chunk detected
  Connection to alectrona closed.

but other kinds of failures (random segfaults, bus errors, etc.) are
seen too.

I have not ever seen the kernel itself oops or anything like that, there
are no abnormal kernel log messages of any kind; except for the normal
ones that get printed when processes segfault, like this one:

  [  563.085851] zsh[2073]: segfault at 10 ip 00000000f7a7c09c (rpc
00000000f7a7c0a0) sp 00000000ff8f5e08 error 1 in
libc.so.6[f7960000+1b2000]

I was able to reproduce this fairly reliably by using GNU ddrescue to
dump a disk from the dvd drive -- things usually go awry after a minute
or two.  So I was able to bisect to this commit:

  2e3468778dbe3ec389a10c21a703bb8e5be5cfbc is the first bad commit
  commit 2e3468778dbe3ec389a10c21a703bb8e5be5cfbc
  Author: Peter Xu <peterx@...hat.com>
  Date:   Thu Aug 11 12:13:29 2022 -0400

      mm: remember young/dirty bit for page migrations

This does not revert cleanly on master, but I ran my test on the
immediately preceding commit (0ccf7f168e17: "mm/thp: carry over dirty
bit when thp splits on pmd") extra times and I am unable to get this
one to crash, so reasonably confident in this bisection result...

Let me know if you need any more info!

Thanks,
  Nick

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ