lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8a32b437-4cea-f265-b26e-509466d5290b@suse.cz>
Date:   Wed, 15 Sep 2021 11:28:26 +0200
From:   Vlastimil Babka <vbabka@...e.cz>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Steven Rostedt <rostedt@...dmis.org>,
        Mike Rapoport <rppt@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...nel.org>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Linux-MM <linux-mm@...ck.org>
Subject: Re: [GIT PULL] tracing: Fixes to bootconfig memory management

On 9/15/21 01:29, Linus Torvalds wrote:
> On Tue, Sep 14, 2021 at 3:48 PM Vlastimil Babka <vbabka@...e.cz> wrote:
>>
>> Well, looks like I can't. Commit 77e02cf57b6cf does boot fine for me,
>> multiple times. But so now does the parent commit 6a4746ba06191. Looks like
>> the magic is gone. I'm now surprised how deterministic it was during the
>> bisect (most bad cases manifested on first boot, only few at second).
> 
> Well, your report was clearly memory corruption by the invalid
> memblock_free() just ending up causing random problems later on.

> So it could easily be 100% deterministic with a certain memory layout
> at a particular commit. And then enough other changes later, and it's
> all gone, because the memory corruption now hits something else that
> didn't even care.
> 
> The code for your oops was
> 
>    0: 48 8b 17              mov    (%rdi),%rdx
>    3: 48 39 d7              cmp    %rdx,%rdi
>    6: 74 43                je     0x4b
>    8: 48 8b 47 08          mov    0x8(%rdi),%rax
>    c: 48 85 c0              test   %rax,%rax
>    f: 74 23                je     0x34
>   11: 49 89 c0              mov    %rax,%r8
>   14:* 48 8b 40 10          mov    0x10(%rax),%rax <-- trapping instruction
> 
> and that's the start of rb_next(), so what's going on is that
> "rb->rb_right" (the second word of 'struct rb_node') ends up having
> that value in %rax:
> 
>   RAX: 343479726f6d656d
> 
> which is ASCII "44yromem" rather than a valid pointer if I looked that up right.

Yep, I was pretty sure it was related to the
"/sys/bus/memory/devices/memory44" sysfs object and bisection would lead to
kobject/sysfs or some memory hotplug related changes. So the result was a
surprise.

> And just _slightly_ different allocation patterns, and your 'struct
> rb_node' gets allocated somewhere else, and you don't see the oops at
> all, or you get it later in some different place.
> 
> Most memory corruption doesn't cause oopses, because most memory isn't
> used as pointers etc.
> 
> What you _could_ try if you care enough is
> 
>  - go back to the thing you bisectted to where you can still hopefully
> recreate the problem
> 
>  - apply that patch at that point with no other changes
> 
> and then the test would hopefully be closer to the state you could
> re-create the problem.
> 
> And hopefully it would still not reproduce, just because the bug is
> fixed, of course ;)

Yeah, that worked! Commit 40caa127f3c7 was still broken, and cherry-pick of
77e02cf57b6cf on top fixed it. Thanks!

> The very unlikely alternative is that your bisect was just pure random
> bad luck and hit the wrong commit entirely, and the oops was due to
> some other problem.
> 
> But it does seem unlikely to be something else. Usually when bisects
> go off into the weeds due to not being reproducible, they go very
> obviously off into the weeds rather than point to something that ends
> up having a very similar bug.
> 
>            Linus
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ