linux-kernel - [REGRESSION] mm: filemap_map_pages NULL pointer dereference

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160205180502.GA5869@hudson.localdomain>
Date:	Fri, 5 Feb 2016 10:05:02 -0800
From:	Jeremiah Mahler <jmmahler@...il.com>
To:	linux-kernel@...r.kernel.org
Cc:	Matthew Wilcox <matthew.r.wilcox@...el.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Hugh Dickins <hughd@...gle.com>,
	Mel Gorman <mgorman@...hsingularity.net>
Subject: [REGRESSION] mm: filemap_map_pages NULL pointer dereference

all,

On a Lenovo X1 Carbon running -next (20160201+, 20160203+) I have
experienced several system hangs.  I usually notice it first when
my browser (Chrome) stops responding but then other programs will stop
responding as well.  The only fix is a reboot.  It is sporadic but it
will usually occur once a day.

In the logs there will be a

  unable to handle kernel NULL pointer dereference

message related to filemap_map_pages+0x10d/0x290 (below).

------------------------------------------------------------
...
[51985.993033] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[51985.993087] IP: [<ffffffff8114a19d>] filemap_map_pages+0x10d/0x290
[51985.993123] PGD 2c772067 PUD 0 
[51985.993144] Oops: 0000 [#1] SMP 
[51985.993166] Modules linked in: ctr ccm cpufreq_conservative cpufreq_stats cpufreq_userspace cpufreq_powersave binfmt_misc i915 arc4 iwldvm mac80211 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul crc32c_intel iTCO_wdt ghash_clmulni_intel iTCO_vendor_support jitterentropy_rng sha256_generic hmac drbg snd_hda_codec_hdmi aesni_intel snd_hda_codec_realtek aes_x86_64 iwlwifi glue_helper snd_hda_codec_generic i2c_algo_bit lrw drm_kms_helper gf128mul ablk_helper cryptd snd_hda_intel drm psmouse snd_hda_codec cfg80211 pcspkr evdev serio_raw snd_hwdep i2c_i801 snd_hda_core sg snd_pcm mei_me lpc_ich mfd_core mei shpchp snd_timer i2c_core wmi thinkpad_acpi nvram snd battery tpm_tis soundcore ac tpm video button intel_smartconnect btusb btbcm btintel bluetooth rfkill loop ipv6 autofs4
[51985.993591]  ext4 crc16 mbcache jbd2 sd_mod ahci libahci libata ehci_pci sdhci_pci scsi_mod xhci_pci sdhci xhci_hcd ehci_hcd mmc_core usbcore usb_common thermal
[51985.993680] CPU: 2 PID: 22993 Comm: chrome Not tainted 4.5.0-rc2-next-20160203+ #11
[51985.993714] Hardware name: LENOVO 3443CTO/3443CTO, BIOS G6ET59WW (2.03 ) 09/11/2012
[51985.993760] task: ffff88004bb04dc0 ti: ffff88002a2f8000 task.ti: ffff88002a2f8000
[51985.993804] RIP: 0010:[<ffffffff8114a19d>]  [<ffffffff8114a19d>] filemap_map_pages+0x10d/0x290
[51985.993845] RSP: 0000:ffff88002a2fbdf8  EFLAGS: 00010202
[51985.993874] RAX: 00000007fffffff8 RBX: 0000000000000001 RCX: 0000000000000003
[51985.993911] RDX: 0000000000000000 RSI: ffffea00005bdd1c RDI: ffffea00005bdd00
[51985.993948] RBP: ffff8800beff4220 R08: 000000000000007f R09: 0000000000000000
[51985.993985] R10: 0000000000000000 R11: ffff8800a39382b8 R12: ffff8801182b9440
[51985.994023] R13: ffff88002a2fbe90 R14: ffff8800be568d80 R15: 0000000000000008
[51985.994061] FS:  00007f3e20276a40(0000) GS:ffff88011e300000(0000) knlGS:0000000000000000
[51985.994103] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[51985.994134] CR2: 0000000000000008 CR3: 00000000be6eb000 CR4: 00000000001406e0
[51985.994172] Stack:
[51985.994184]  ffff8800beff4228 00007f3e0ae63000 0000000000000001 0000000000000000
[51985.994229]  0000000000000001 00007f3e0ae63000 ffff8800be568d80 0000000000000054
[51985.994273]  ffff880000000318 ffff88003584c318 ffff8800840076c0 ffffffff8117b073
[51985.994318] Call Trace:
[51985.994336]  [<ffffffff8117b073>] ? handle_mm_fault+0x13b3/0x1790
[51985.994370]  [<ffffffff810aa7f1>] ? up_write+0x21/0x30
[51985.994400]  [<ffffffff81059052>] ? __do_page_fault+0x192/0x410
[51985.994434]  [<ffffffff814ffcc8>] ? page_fault+0x28/0x30
[51985.994463] Code: 00 00 00 48 8b 54 24 10 49 3b 55 28 74 48 48 8b 44 24 18 83 e8 01 29 d0 49 8d 04 c7 49 39 c7 74 19 49 83 c7 08 48 83 44 24 10 01 <49> 83 3f 00 74 eb 4d 85 ff 0f 85 3b ff ff ff 48 8b 3c 24 48 8d 
[51985.994656] RIP  [<ffffffff8114a19d>] filemap_map_pages+0x10d/0x290
[51985.994692]  RSP <ffff88002a2fbdf8>
[51985.994711] CR2: 0000000000000008
[51986.002154] ---[ end trace da60309b42c1da53 ]---
[52006.988971] INFO: rcu_sched self-detected stall on CPU
[52006.988978] 	3-...: (5249 ticks this GP) idle=765/140000000000001/0 softirq=931903/931903 fqs=5247 
[52006.988980] 	 (t=5250 jiffies g=821290 c=821289 q=1968)
[52006.988982] Task dump for CPU 3:
[52006.988985] CompositorTileW R  running task        0 22999   1425 0x00000108
[52006.988988]  ffffffff81851580 ffffffff81148a21 ffff88011e397540 ffffffff81851580
[52006.988991]  0000000000000000 ffff880117daa180 ffffffff810c5e19 0000000000983e0c
[52006.988993]  ffffffff810cfd7e 0000000000000092 0000000000000092 003b9aca00000000
[52006.988995] Call Trace:
[52006.988996]  <IRQ>  [<ffffffff81148a21>] ? rcu_dump_cpu_stacks+0x71/0x8a
[52006.989004]  [<ffffffff810c5e19>] ? rcu_check_callbacks+0x6e9/0x790
[52006.989007]  [<ffffffff810cfd7e>] ? timekeeping_update+0xee/0x150
[52006.989009]  [<ffffffff810d8bd0>] ? tick_sched_handle.isra.14+0x50/0x50
[52006.989011]  [<ffffffff810ca712>] ? update_process_times+0x32/0x60
[52006.989013]  [<ffffffff810d8ba0>] ? tick_sched_handle.isra.14+0x20/0x50
[52006.989014]  [<ffffffff810d8c08>] ? tick_sched_timer+0x38/0x70
[52006.989016]  [<ffffffff810caf1c>] ? __hrtimer_run_queues+0xec/0x230
[52006.989017]  [<ffffffff810cb5fa>] ? hrtimer_interrupt+0x9a/0x1a0
[52006.989020]  [<ffffffff81500599>] ? smp_apic_timer_interrupt+0x39/0x50
[52006.989022]  [<ffffffff814fea62>] ? apic_timer_interrupt+0x82/0x90
[52006.989023]  <EOI>  [<ffffffff812dfa05>] ? delay_tsc+0x25/0x50
[52006.989028]  [<ffffffff810ac4d6>] ? do_raw_spin_lock+0x86/0x150
[52006.989031]  [<ffffffff8117a175>] ? handle_mm_fault+0x4b5/0x1790
[52006.989033]  [<ffffffff812e0be4>] ? call_rwsem_down_read_failed+0x14/0x30
[52006.989035]  [<ffffffff81059052>] ? __do_page_fault+0x192/0x410
[52006.989037]  [<ffffffff814ffcc8>] ? page_fault+0x28/0x30
------------------------------------------------------------

Referring again to the RIP line from the trace.

[51985.994656] RIP  [<ffffffff8114a19d>] filemap_map_pages+0x10d/0x290

jeri@...son:~/linux-next$ gdb vmlinux
(gdb) list *0xffffffff8114a19d
0xffffffff8114a19d is in filemap_map_pages
(include/linux/radix-tree.h:465).
460			unsigned size = radix_tree_chunk_size(iter) - 1;
461	
462			while (size--) {
463				slot++;
464				iter->index++;
465				if (likely(*slot))
466					return slot;
467				if (flags & RADIX_TREE_ITER_CONTIG) {
468					/* forbid switching to the next chunk */
469					iter->next_index = 0;
(gdb)

Assuming I traced the addresses correctly, this indicates that the
fault is triggered when the value in the slot pointer is accessed.
Perhaps slot is being incremented beyond its valid range?

-- 
- Jeremiah Mahler