lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220814223743.26ebsbnrvrjien4f@awork3.anarazel.de>
Date:   Sun, 14 Aug 2022 15:37:43 -0700
From:   Andres Freund <andres@...razel.de>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Guenter Roeck <linux@...ck-us.net>, linux-kernel@...r.kernel.org,
        Greg KH <gregkh@...uxfoundation.org>
Subject: Re: upstream kernel crashes

Hi,

On 2022-08-14 14:40:25 -0700, Linus Torvalds wrote:
> On Sun, Aug 14, 2022 at 2:26 PM Guenter Roeck <linux@...ck-us.net> wrote:
> >
> > Hi all,
> >
> > syscaller reports lots of crashes with the mainline kernel. The latest
> > I have seen, based on the current ToT, is attached. Backtraces are not
> > always the same, but this one is typical.
> >
> > I have not tried to analyze this further.
>
> This smells like the same issue that Andres Freund reported:
>
>    https://lore.kernel.org/all/20220814043906.xkmhmnp23bqjzz4s@awork3.anarazel.de/
>
> he blamed the virtio changes, mainly based on the upstream merges
> between his bad tree and last good tree, ie
>
>     git log --merges --author=torvalds --oneline 7ebfc85e2cd7..69dac8e431af
>
> and assuming those end-points are accurate, that does seem to be the
> most likely area.

That range had different symptoms, I think (networking not working, but not
oopsing). I hit similar issues when tried to reproduce the issue
interactively, to produce more details, and unwisely did git pull instead of
checking out the precise revision, ending up with aea23e7c464b. That's when
symptoms look similar to the above.  So it'd be 69dac8e431af..aea23e7c464b
that I'd be more suspicious off in the context of this thread.

Which would make me look at the following first:
e140f731f980 Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
abe7a481aac9 Merge tag 'block-6.0-2022-08-12' of git://git.kernel.dk/linux-block
1da8cf961bb1 Merge tag 'io_uring-6.0-2022-08-13' of git://git.kernel.dk/linux-block

But I'm not a kernel developer...


> Andres was going to try to bisect it, but if you have some idea where
> this all started, that would help.

I started, but it's one of those days.

First I got sidetracked first by the interactive "serial console" of google
cloud apparently not working during grub, making it harder to recover from a
bad kernel.

Then by a bug in git bisect. For some reason debian sid's git is stuck at:

clear_distance (list=0x5616f56656a0) at ./bisect.c:71
71	./bisect.c: No such file or directory.
(gdb) bt
#0  clear_distance (list=0x5616f56656a0) at ./bisect.c:71
#1  do_find_bisection (bisect_flags=0, weights=0x561701309e50, nr=1120570, list=<optimized out>) at ./bisect.c:332
#2  find_bisection (commit_list=commit_list@...ry=0x7ffe32540d50, reaches=reaches@...ry=0x7ffe32540c90, all=all@...ry=0x7ffe32540c94,
    bisect_flags=bisect_flags@...ry=0) at ./bisect.c:428
#3  0x00005616b63636ae in bisect_next_all (r=<optimized out>, prefix=prefix@...ry=0x0) at ./bisect.c:1047
#4  0x00005616b628bad7 in bisect_next (terms=terms@...ry=0x7ffe32542720, prefix=prefix@...ry=0x0) at builtin/bisect--helper.c:637
#5  0x00005616b628be42 in bisect_auto_next (terms=terms@...ry=0x7ffe32542720, prefix=0x0) at builtin/bisect--helper.c:656
#6  0x00005616b628c2c0 in bisect_state (terms=terms@...ry=0x7ffe32542720, argv=<optimized out>, argv@...ry=0x7ffe32543be0, argc=<optimized out>, argc@...ry=2)
    at builtin/bisect--helper.c:971
#7  0x00005616b628cc18 in cmd_bisect__helper (argc=2, argv=0x7ffe32543be0, prefix=<optimized out>) at builtin/bisect--helper.c:1356
#8  0x00005616b627f3aa in run_builtin (argv=0x7ffe32543be0, argc=4, p=0x5616b65b6818 <commands+120>) at ./git.c:466
#9  handle_builtin (argc=4, argv=argv@...ry=0x7ffe32543be0) at ./git.c:720
#10 0x00005616b6280662 in run_argv (argv=0x7ffe325438e0, argcp=0x7ffe325438ec) at ./git.c:787
#11 cmd_main (argc=<optimized out>, argc@...ry=5, argv=<optimized out>, argv@...ry=0x7ffe32543bd8) at ./git.c:920
#12 0x00005616b627f085 in main (argc=5, argv=0x7ffe32543bd8) at ./common-main.c:56

I suspect it's somehow related to starting out with a shallow clone and then
fetching the full history. So my next step is going to be start with a fresh
clone. But it's dinner prep time now, so it'll have to wait a bit.

Greetings,

Andres Freund

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ