lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 14 Jun 2021 23:30:11 +0300
From:   Pavel Skripkin <paskripkin@...il.com>
To:     Dongliang Mu <mudongliangabcd@...il.com>
Cc:     alex.aring@...il.com, "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        linux-wpan@...r.kernel.org, netdev@...r.kernel.org,
        stefan@...enfreihafen.org,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        syzbot+b80c9959009a9325cdff@...kaller.appspotmail.com,
        Dan Carpenter <dan.carpenter@...cle.com>,
        Greg KH <gregkh@...uxfoundation.org>
Subject: Re: Suggestions on how to debug kernel crashes where printk and gdb
 both does not work

On Mon, 14 Jun 2021 23:04:03 +0800
Dongliang Mu <mudongliangabcd@...il.com> wrote:

> On Mon, Jun 14, 2021 at 10:47 PM Pavel Skripkin
> <paskripkin@...il.com> wrote:
> >
> > On Mon, 14 Jun 2021 22:40:55 +0800
> > Dongliang Mu <mudongliangabcd@...il.com> wrote:
> >
> > > On Mon, Jun 14, 2021 at 10:25 PM Pavel Skripkin
> > > <paskripkin@...il.com> wrote:
> > > >
> > > > On Mon, 14 Jun 2021 22:19:10 +0800
> > > > Dongliang Mu <mudongliangabcd@...il.com> wrote:
> > > >
> > > > > On Mon, Jun 14, 2021 at 9:34 PM Pavel Skripkin
> > > > > <paskripkin@...il.com> wrote:
> > > > > >
> > > > > > On Mon, 14 Jun 2021 21:22:43 +0800
> > > > > > Dongliang Mu <mudongliangabcd@...il.com> wrote:
> > > > > >
> > > > > > > Dear kernel developers,
> > > > > > >
> > > > > > > I was trying to debug the crash - memory leak in
> > > > > > > hwsim_add_one [1] recently. However, I encountered a
> > > > > > > disgusting issue: my breakpoint and printk/pr_alert in the
> > > > > > > functions that will be surely executed do not work. The
> > > > > > > stack trace is in the following. I wrote this email to
> > > > > > > ask for some suggestions on how to debug such cases?
> > > > > > >
> > > > > > > Thanks very much. Looking forward to your reply.
> > > > > > >
> > > > > >
> > > > > > Hi, Dongliang!
> > > > > >
> > > > > > This bug is not similar to others on the dashboard. I spent
> > > > > > some time debugging it a week ago. The main problem here,
> > > > > > that memory allocation happens in the boot time:
> > > > > >
> > > > > > > [<ffffffff84359255>] kernel_init+0xc/0x1a7
> > > > > > > init/main.c:1447
> > > > > >
> > > > >
> > > > > Oh, nice catch. No wonder why my debugging does not work. :(
> > > > >
> > > > > > and reproducer simply tries to
> > > > > > free this data. You can use ftrace to look at it. Smth like
> > > > > > this:
> > > > > >
> > > > > > $ echo 'hwsim_*' > $TRACE_DIR/set_ftrace_filter
> > > > >
> > > > > Thanks for your suggestion.
> > > > >
> > > > > Do you have any conclusions about this case? If you have found
> > > > > out the root cause and start writing patches, I will turn my
> > > > > focus to other cases.
> > > >
> > > > No, I had some busy days and I have nothing about this bug for
> > > > now. I've just traced the reproducer execution and that's all :)
> > > >
> > > > I guess, some error handling paths are broken, but Im not sure
> > >
> > > In the beginning, I agreed with you. However, after I manually
> > > checked functions: hwsim_probe (initialization) and  hwsim_remove
> > > (cleanup), then things may be different. The cleanup looks
> > > correct to me. I would like to debug but stuck with the debugging
> > > process.
> > >
> > > And there is another issue: the cleanup function also does not
> > > output anything or hit the breakpoint. I don't quite understand
> > > it since the cleanup is not at the boot time.
> > >
> > > Any idea?
> > >
> >
> > Output from ftrace (syzkaller repro):
> >
> > root@...kaller:~# cat /sys/kernel/tracing/trace
> > # tracer: function_graph
> > #
> > # CPU  DURATION                  FUNCTION CALLS
> > # |     |   |                     |   |   |   |
> >  1)               |  hwsim_del_radio_nl() {
> >  1)               |    hwsim_del() {
> >  1)               |      hwsim_edge_unsubscribe_me() {
> >  1) ! 310.041 us  |        hwsim_free_edge();
> >  1) ! 665.221 us  |      }
> >  1) * 52999.05 us |    }
> >  1) * 53035.38 us |  }
> >
> > Cleanup function is not the case, I think :)
> 
> It seems like I spot the incorrect cleanup function (hwsim_remove is
> the right one is in my mind). Let me learn how to use ftrace to log
> the executed functions and then discuss this case with you guys.
> 

Hmmm, I think, there is a mess with lists.

I just want to share my debug results, I have no idea about the fix for
now.

In hwsim_probe() edge for phy->idx = 1 is allocated, then reproduces
sends a request to delete phy with idx == 0, so this check in
hwsim_edge_unsubscribe_me():

	if (e->endpoint->idx == phy->idx) { 
		... clean up code ...
	}

won't be passed and edge won't be freed (because it was allocated for
phy with idx == 1). Allocated edge for phy 1 becomes leaked after
hwsim_del(). I can't really see the code where phy with idx == 1 can
be deleted from list...

Maybe, it's kmemleak bug. Similar strange case was with this one
https://syzkaller.appspot.com/bug?id=3a325b8389fc41c1bc94de0f4ac437ed13cce584.
I find it strange, that I could reach leaked pointers after kmemleak reported a
leak. Im not familiar with kmemleak internals and I might be wrong 


With regards,
Pavel Skripkin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ