linux-kernel - Re: Crash in crc32_le during kmemleak

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d5b53c2263c3bbb3d2070afb24bb0c38.squirrel@www.codeaurora.org>
Date:	Wed, 20 May 2015 13:16:46 -0000
From:	vigneshr@...eaurora.org
To:	vigneshr@...eaurora.org
Cc:	linux-kernel@...r.kernel.org, catalin.marinas@....com,
	bernd.schubert@...m.fraunhofer.de
Subject: Re: Crash in crc32_le during kmemleak_scan()

Hi,

Sharing an update. We got back the test results. As such we did not
observe the bug, but we hit another side effect.

After kmemleak was disabled due to low memory

    96.825883:   <6> kmemleak: Cannot allocate a kmemleak_object structure
    96.825902:   <6> kmemleak: Cannot allocate a kmemleak_object structure
    96.825923:   <6> kmemleak: Kernel memory leak detector disabled


We saw the following warning appearing in the dmesg :

    97.023202:   <6> kmemleak: Trying to color unknown object at
0xffffffc01f0b2c80 as Grey
    97.029735:   <6> CPU: 0 PID: 49 Comm: kworker/u8:3 Tainted: G        W
 O 3.10.49-gde07ba9-00027-g9ce640d #1
    97.029751:   <2> Workqueue: diag_wq diag_read_smd_work_fn
    97.029758:   <2> Call trace:
    97.029769:   <2> [<ffffffc000206a2c>] dump_backtrace+0x0/0x270
    97.029777:   <2> [<ffffffc000206cac>] show_stack+0x10/0x1c
    97.029786:   <2> [<ffffffc000bf0eec>] dump_stack+0x1c/0x28
    97.029794:   <2> [<ffffffc0002f1da8>] paint_ptr+0x64/0xb8
    97.029801:   <2> [<ffffffc000be885c>] kmemleak_not_leak+0x64/0x98
    97.029808:   <2> [<ffffffc00056bd94>] diagmem_alloc+0x148/0x174
    97.029816:   <2> [<ffffffc00056b780>] diag_usb_write+0x80/0x180
    97.029823:   <2> [<ffffffc00056ac38>] diag_mux_write+0x38/0x4c
    97.029830:   <2> [<ffffffc0005682b0>]
diag_process_smd_read_data+0x270/0x2e4
    97.029837:   <2> [<ffffffc00056889c>] diag_smd_send_req+0x578/0x5b0
    97.029844:   <2> [<ffffffc0005688e0>] diag_read_smd_work_fn+0xc/0x18
    97.029852:   <2> [<ffffffc00023589c>] process_one_work+0x258/0x3d0
    97.029858:   <2> [<ffffffc0002368dc>] worker_thread+0x1f0/0x340
    97.029867:   <2> [<ffffffc00023bdc8>] kthread+0xac/0xb8

This should be fixed by the addition of the following to the patch that
was provided earlier :

--------------------- 8< --------------------------------


diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 5ec8b71..7d67c87 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -1019,7 +1019,7 @@ void __ref kmemleak_not_leak(const void *ptr)
 {
        pr_debug("%s(0x%p)\n", __func__, ptr);

-       if (kmemleak_enabled && ptr && !IS_ERR(ptr))
+       if (kmemleak_enabled && ptr && !IS_ERR(ptr) && !kmemleak_error)
                make_gray_object((unsigned long)ptr);
        else if (kmemleak_early_log)
                log_early(KMEMLEAK_NOT_LEAK, ptr, 0, 0);






--------------------- 8< --------------------------------



However, the more concerning thing was that we crashed later at around 100
seconds. I have pasted the snippet of the crash log below :



   100.946902:   <6> Unable to handle kernel NULL pointer dereference at
virtual address 00000000
   100.953953:   <6> pgd = ffffffc00007d000
   100.957338:   <2> [00000000] *pgd=000000008c307003,
*pmd=000000008c308003, *pte=00e000000b000407
   100.965587:   <6> Internal error: Oops: 96000006 [#1] PREEMPT SMP
   100.971139:   <2> Modules linked in: wlan(O)
   100.974876:   <6> CPU: 0 PID: 246 Comm: kworker/0:2 Tainted: G        W
 O 3.10.49-gde07ba9-00027-g9ce640d #1
   100.984254:   <2> Workqueue: events kmemleak_do_cleanup
   100.988936:   <6> task: ffffffc02e7d9580 ti: ffffffc02da0c000 task.ti:
ffffffc02da0c000
   100.996404:   <2> PC is at rb_erase+0x224/0x344
   101.000393:   <2> LR is at __delete_object+0x3c/0xc8
   101.004820:   <2> pc : [<ffffffc000444e48>] lr : [<ffffffc0002f1afc>]
pstate: 600001c5
   101.012198:   <2> sp : ffffffc02da0fcb0
   101.015497:   <2> x29: ffffffc02da0fcb0 x28: 0000000000000009
   101.020792:   <2> x27: ffffffc0015f9000 x26: 0000000000000000
   101.026086:   <2> x25: ffffffc034ad2a40 x24: ffffffc02da0c000
   101.031380:   <2> x23: ffffffc034ad7b00 x22: 0000000000000000
   101.036676:   <2> x21: 0000000000000140 x20: ffffffc0014e26b8
   101.041972:   <2> x19: ffffffc0236b03c0 x18: 0000000000000000
   101.047266:   <2> x17: 0000007f973e7f8c x16: ffffffc00030447c
   101.052561:   <2> x15: 0000000000000000 x14: 0000000066666667
   101.057857:   <2> x13: 000000000000000a x12: 0000007f7c95e9f0
   101.063152:   <2> x11: 731a976167c4c606 x10: 0000000000000013
   101.068447:   <2> x9 : 0000000000127500 x8 : 0000001780764fe3
   101.073743:   <2> x7 : ffffffc0236b1bc0 x6 : 0000000000000000
   101.079038:   <2> x5 : ffffffc02eeb6f41 x4 : 0000000000000000
   101.084333:   <2> x3 : 0000000000000000 x2 : ffffffc02eeb6f40
   101.089628:   <2> x1 : ffffffc0016e05b8 x0 : 0000000000000000


	<SNIP .................>

   101.454125:   <6> Process kworker/0:2 (pid: 246, stack limit =
0xffffffc02da0c058)
   101.461155:   <2> Call trace:
   101.463590:   <2> [<ffffffc000444e48>] rb_erase+0x224/0x344
   101.468624:   <2> [<ffffffc0002f1c6c>] delete_object_full+0x1c/0x34
   101.474353:   <2> [<ffffffc0002f1cbc>] __kmemleak_do_cleanup+0x38/0x54
   101.480342:   <2> [<ffffffc0002f1d10>] kmemleak_do_cleanup+0x38/0x6c
   101.486160:   <2> [<ffffffc00023589c>] process_one_work+0x258/0x3d0
   101.491887:   <2> [<ffffffc0002368dc>] worker_thread+0x1f0/0x340
   101.497359:   <2> [<ffffffc00023bdc8>] kthread+0xac/0xb8
   101.502132:   <6> Code: f9000046 14000040 f9400843 aa0303e0 (f9400064)


Analysis so far :

I haven't been able to figure out exactly how the dots are connected and
the relation to the change provided earlier but the following is what i
could put together :

When i checked the post mortem analysis i see the following in the
crashing stack :

-018|rb_erase(
    |    [locdesc] node = ?,
    |    [X1] root = 0xFFFFFFC0016E05B8)
    |  [X1] root = 0xFFFFFFC0016E05B8
    |  [locdesc] node = ?
    |  [X2] parent = 0xFFFFFFC02EEB6F40
    |  [X4] node = 0x0
    |  [X3] sibling = 0x0
-019|__delete_object(
    |    [X19] object = 0xFFFFFFC0236B03C0)
    |  [X21] flags = 0x0140
-020|delete_object_full(
    |  ?)
    |  [X19] object = 0xFFFFFFC0236B03C0
-021|__kmemleak_do_cleanup()
    |  [X19] object = 0xFFFFFFC0236B03C0 -> (
    |    [0xFFFFFFC0236B03C0] lock = ([0xFFFFFFC0236B03C0] rlock =
([0xFFFFFFC0236B03C0] raw_lock = ([0xFFFFFFC0236B03C0] owner
    |    [0xFFFFFFC0236B03D8] flags = 0x0,
    |    [0xFFFFFFC0236B03E0] object_list = ([NSD:0xFFFFFFC0236B03E0] next
= 0xFFFFFFC0236B1920, [NSD:0xFFFFFFC0236B03E8] prev = 0x00200200),
    |    [0xFFFFFFC0236B03F0] gray_list = ([0xFFFFFFC0236B03F0] next =
0x00100100, [0xFFFFFFC0236B03F8] prev = 0x00200200),
	 [0xFFFFFFC0236B0400] rb_node = (
      			[0xFFFFFFC0236B0400] __rb_parent_color = 0xFFFFFFC02EEB6F41,
			[0xFFFFFFC0236B0408] rb_right = 0x0,
			[0xFFFFFFC0236B0410] rb_left = 0x0


The node is becoming 0x0 here and is crashing in the following piece of
code in rb_erase:


  309         } else {
  310             sibling = parent->rb_left;
  311             if (rb_is_red(sibling)) {

and rb_is_red(sibling) resolves into :

96 #define rb_is_red(rb)      __rb_is_red((rb)->__rb_parent_color)



1. Here the parent ptr refers to Object pointer residing at
0xFFFFFFC0236B03C0 and we can see that rb_node->rb_left is becoming 0x0 (
this is from the values observed in the crashed stack)

		 [0xFFFFFFC0236B0400] rb_node = (
      			[0xFFFFFFC0236B0400] __rb_parent_color = 0xFFFFFFC02EEB6F41,
			[0xFFFFFFC0236B0408] rb_right = 0x0,
			[0xFFFFFFC0236B0410] rb_left = 0x0

2. Therefore the sibling will take the value 0x0

3. And then when rb->__rb_parent_color operation takes place, we crash
since its not able to resolve 0x0 (since rb=0x0 here).


So the question is how this variable ended up being this way.

1. I checked the object_list via list walkthrough  and it did not report
any corruptions.

2. However, the object in question from our call stack,
object->object_list pointers looks a bit amiss.

  object_list = ([NSD:0xFFFFFFC0236B03E0] next = 0xFFFFFFC0236B1920,
[NSD:0xFFFFFFC0236B03E8] prev = 0x00200200),

But its not clear to me how it ended up being in this state.


I will continue to debug what went wrong and share my analysis. Please let
me know if anything looks obvious to you that i can try out.

====================
Thanks and regards,
Vignesh Radhakrishnan


QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of the Code Aurora Forum, hosted by The Linux Foundation.





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/