linux-kernel - Re: [PATCH 0/2] shmem: fix faulting into a hole while it's punched, take 3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <53CDD961.1080006@oracle.com>
Date:	Mon, 21 Jul 2014 23:24:17 -0400
From:	Sasha Levin <sasha.levin@...cle.com>
To:	Hugh Dickins <hughd@...gle.com>,
	Andrew Morton <akpm@...ux-foundation.org>
CC:	Vlastimil Babka <vbabka@...e.cz>,
	Konstantin Khlebnikov <koct9i@...il.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Michel Lespinasse <walken@...gle.com>,
	Lukas Czerner <lczerner@...hat.com>,
	Dave Jones <davej@...hat.com>, linux-mm@...ck.org,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2] shmem: fix faulting into a hole while it's punched,
 take 3

On 07/19/2014 07:44 PM, Hugh Dickins wrote:
>> Otherwise, I've been unable to reproduce the shmem_fallocate hang.
> Great.  Andrew, I think we can say that it's now safe to send
> 1/2 shmem: fix faulting into a hole, not taking i_mutex
> 2/2 shmem: fix splicing from a hole while it's punched
> on to Linus whenever suits you.
> 
> (You have some other patches in the mainline-later section of the
> mmotm/series file: they're okay too, but not in doubt as these two were.)

I think we may need to hold off on sending them...

It seems that this code in shmem_fault():

	/*
	 * shmem_falloc_waitq points into the shmem_fallocate()
	 * stack of the hole-punching task: shmem_falloc_waitq
	 * is usually invalid by the time we reach here, but
	 * finish_wait() does not dereference it in that case;
	 * though i_lock needed lest racing with wake_up_all().
	 */
	spin_lock(&inode->i_lock);
	finish_wait(shmem_falloc_waitq, &shmem_fault_wait);
	spin_unlock(&inode->i_lock);

Is problematic. I'm not sure what changed, but it seems to be causing everything
from NULL ptr derefs:

[  169.922536] BUG: unable to handle kernel NULL pointer dereference at 0000000000000631
[  169.925638] IP: __lock_acquire (./arch/x86/include/asm/atomic.h:92 kernel/locking/lockdep.c:3082)
[  169.927845] PGD 1d38af067 PUD 1d38b0067 PMD 0
[  169.929644] Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[  169.930082] Dumping ftrace buffer:
[  169.930082]    (ftrace buffer empty)
[  169.930082] Modules linked in:
[  169.930082] CPU: 14 PID: 8824 Comm: trinity-c53 Tainted: G        W      3.16.0-rc5-next-20140721-sasha-00051-g258dfea-dirty #925
[  169.930082] task: ffff8801d3893000 ti: ffff8801d38f8000 task.ti: ffff8801d38f8000
[  169.930082] RIP: __lock_acquire (./arch/x86/include/asm/atomic.h:92 kernel/locking/lockdep.c:3082)
[  169.930082] RSP: 0000:ffff8801d38fb6c0  EFLAGS: 00010006
[  169.930082] RAX: 0000000000000000 RBX: ffff8801d3893000 RCX: 0000000000000001
[  169.930082] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8801b2b13d98
[  169.930082] RBP: ffff8801d38fb728 R08: 0000000000000001 R09: 0000000000000001
[  169.930082] R10: 0000000000000499 R11: 0000000000000001 R12: 0000000000000000
[  169.930082] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8801b2b13d98
[  169.930082] FS:  00007f9e6374a700(0000) GS:ffff880548e00000(0000) knlGS:0000000000000000
[  169.930082] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  169.930082] CR2: 0000000000000631 CR3: 00000001d38ae000 CR4: 00000000000006a0
[  169.930082] Stack:
[  169.930082]  ffff8801d3893000 ffff8801d3893000 ffffffffa6053bf0 0000000000000290
[  169.930082]  0000000000000000 ffff8801d38fb760 ffffffff9f1d0be2 ffffffff9f1cdbdb
[  169.930082]  ffff8801b2b13d80 0000000000000000 0000000000000000 0000000000000001
[  169.930082] Call Trace:
[  169.930082] ? __lock_acquire (kernel/locking/lockdep.c:3189)
[  169.930082] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2557 kernel/locking/lockdep.c:2599)
[  169.930082] lock_acquire (./arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[  169.930082] ? finish_wait (include/linux/list.h:144 kernel/sched/wait.c:251)
[  169.930082] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:117 kernel/locking/spinlock.c:159)
[  169.930082] ? finish_wait (include/linux/list.h:144 kernel/sched/wait.c:251)
[  169.930082] finish_wait (include/linux/list.h:144 kernel/sched/wait.c:251)
[  169.930082] shmem_fault (include/linux/spinlock.h:343 mm/shmem.c:1327)
[  169.930082] ? __wait_on_bit_lock (kernel/sched/wait.c:291)
[  169.930082] __do_fault (mm/memory.c:2713)
[  169.930082] do_read_fault.isra.40 (mm/memory.c:2905)
[  169.930082] handle_mm_fault (mm/memory.c:3092 mm/memory.c:3225 mm/memory.c:3345 mm/memory.c:3374)
[  169.930082] ? __lock_is_held (kernel/locking/lockdep.c:3516)
[  170.003723] __do_page_fault (arch/x86/mm/fault.c:1231)
[  170.003723] ? context_tracking_user_exit (kernel/context_tracking.c:184)
[  170.003723] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[  170.003723] ? trace_hardirqs_off_caller (kernel/locking/lockdep.c:2639 (discriminator 8))
[  170.003723] trace_do_page_fault (arch/x86/mm/fault.c:1314 include/linux/jump_label.h:115 include/linux/context_tracking_state.h:27 include/linux/context_tracking.h:45 arch/x86/mm/fault.c:1315)
[  170.003723] do_async_page_fault (arch/x86/kernel/kvm.c:279)
[  170.003723] async_page_fault (arch/x86/kernel/entry_64.S:1321)
[  170.003723] ? copy_user_generic_unrolled (arch/x86/lib/copy_user_64.S:137)
[  170.003723] ? copy_page_from_iter_iovec (mm/iov_iter.c:141)
[  170.003723] copy_page_from_iter (mm/iov_iter.c:668)
[  170.003723] process_vm_rw_core.isra.2 (mm/process_vm_access.c:50 mm/process_vm_access.c:114 mm/process_vm_access.c:213)
[  170.003723] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3769)
[  170.003723] ? might_fault (mm/memory.c:3770)
[  170.003723] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3769)
[  170.003723] ? rw_copy_check_uvector (fs/read_write.c:758)
[  170.003723] process_vm_rw (mm/process_vm_access.c:287)
[  170.003723] ? debug_smp_processor_id (lib/smp_processor_id.c:57)
[  170.003723] ? put_lock_stats.isra.13 (./arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254)
[  170.003723] ? vtime_account_user (kernel/sched/cputime.c:687)
[  170.003723] ? context_tracking_user_exit (./arch/x86/include/asm/paravirt.h:809 (discriminator 2) kernel/context_tracking.c:184 (discriminator 2))
[  170.003723] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[  170.003723] ? syscall_trace_enter (include/trace/events/syscalls.h:16 arch/x86/kernel/ptrace.c:1488)
[  170.003723] SyS_process_vm_writev (mm/process_vm_access.c:302)
[  170.003723] tracesys (arch/x86/kernel/entry_64.S:541)
[ 170.003723] Code: 49 81 3f 00 3e 97 a5 b8 00 00 00 00 44 0f 44 c0 41 83 fe 01 0f 87 e5 fe ff ff 44 89 f0 4d 8b 54 c7 08 4d 85 d2 0f 84 d4 fe ff ff <f0> 41 ff 82 98 01 00 00 8b 8b f0 0c 00 00 83 f9 2f 76 0e 8b 05
All code
========
   0:   49 81 3f 00 3e 97 a5    cmpq   $0xffffffffa5973e00,(%r15)
   7:   b8 00 00 00 00          mov    $0x0,%eax
   c:   44 0f 44 c0             cmove  %eax,%r8d
  10:   41 83 fe 01             cmp    $0x1,%r14d
  14:   0f 87 e5 fe ff ff       ja     0xfffffffffffffeff
  1a:   44 89 f0                mov    %r14d,%eax
  1d:   4d 8b 54 c7 08          mov    0x8(%r15,%rax,8),%r10
  22:   4d 85 d2                test   %r10,%r10
  25:   0f 84 d4 fe ff ff       je     0xfffffffffffffeff
  2b:*  f0 41 ff 82 98 01 00    lock incl 0x198(%r10)           <-- trapping instruction
  32:   00
  33:   8b 8b f0 0c 00 00       mov    0xcf0(%rbx),%ecx
  39:   83 f9 2f                cmp    $0x2f,%ecx
  3c:   76 0e                   jbe    0x4c
  3e:   8b                      .byte 0x8b
  3f:   05                      .byte 0x5
        ...

Code starting with the faulting instruction
===========================================
   0:   f0 41 ff 82 98 01 00    lock incl 0x198(%r10)
   7:   00
   8:   8b 8b f0 0c 00 00       mov    0xcf0(%rbx),%ecx
   e:   83 f9 2f                cmp    $0x2f,%ecx
  11:   76 0e                   jbe    0x21
  13:   8b                      .byte 0x8b
  14:   05                      .byte 0x5
        ...
[  170.003723] RIP __lock_acquire (./arch/x86/include/asm/atomic.h:92 kernel/locking/lockdep.c:3082)
[  170.003723]  RSP <ffff8801d38fb6c0>
[  170.003723] CR2: 0000000000000631

To memory corruptions:

[ 1031.264226] BUG: spinlock bad magic on CPU#1, trinity-c99/25740
[ 1031.265632]  lock: 0xffff88038023fd80, .magic: ffff8802, .owner: %<C0><DA>/1711276032, .owner_cpu: 0
[ 1031.267000] CPU: 1 PID: 25740 Comm: trinity-c99 Tainted: G        W      3.16.0-rc5-next-20140721-sasha-00051-g258dfea-dirty #925
[ 1031.270013]  ffff88038023fd80 ffff88010d2a38c0 ffffffffa24c0712 ffffffff9f1a703d
[ 1031.270081]  ffff88010d2a38e0 ffffffff9f1d6d76 ffff88038023fd80 ffffffffa396a896
[ 1031.270081]  ffff88010d2a3900 ffffffff9f1d6df6 ffff88038023fd80 ffff88038023fd80
[ 1031.270081] Call Trace:
[ 1031.270081] dump_stack (lib/dump_stack.c:52)
[ 1031.270081] ? sched_clock_local (kernel/sched/clock.c:214)
[ 1031.270081] spin_dump (kernel/locking/spinlock_debug.c:68 (discriminator 8))
[ 1031.270081] spin_bug (kernel/locking/spinlock_debug.c:76)
[ 1031.270081] do_raw_spin_unlock (./arch/x86/include/asm/spinlock.h:165 kernel/locking/spinlock_debug.c:98 kernel/locking/spinlock_debug.c:158)
[ 1031.270081] _raw_spin_unlock_irqrestore (include/linux/spinlock_api_smp.h:160 kernel/locking/spinlock.c:191)
[ 1031.270081] finish_wait (kernel/sched/wait.c:254)
[ 1031.270081] shmem_fault (include/linux/spinlock.h:343 mm/shmem.c:1327)
[ 1031.270081] ? __wait_on_bit_lock (kernel/sched/wait.c:291)
[ 1031.270081] __do_fault (mm/memory.c:2713)
[ 1031.270081] do_shared_fault (mm/memory.c:2985 (discriminator 8))
[ 1031.270081] handle_mm_fault (mm/memory.c:3097 mm/memory.c:3225 mm/memory.c:3345 mm/memory.c:3374)
[ 1031.270081] __do_page_fault (arch/x86/mm/fault.c:1231)
[ 1031.270081] ? sched_clock_cpu (kernel/sched/clock.c:311)
[ 1031.270081] ? context_tracking_user_exit (kernel/context_tracking.c:184)
[ 1031.270081] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[ 1031.270081] ? trace_hardirqs_off_caller (kernel/locking/lockdep.c:2639 (discriminator 8))
[ 1031.270081] trace_do_page_fault (arch/x86/mm/fault.c:1314 include/linux/jump_label.h:115 include/linux/context_tracking_state.h:27 include/linux/context_tracking.h:45 arch/x86/mm/fault.c:1315)
[ 1031.270081] do_async_page_fault (arch/x86/kernel/kvm.c:279)
[ 1031.270081] async_page_fault (arch/x86/kernel/entry_64.S:1321)
[ 1031.270081] ? copy_page_to_iter_iovec (include/linux/pagemap.h:562 mm/iov_iter.c:27)
[ 1031.270081] ? vmsplice_to_user (fs/splice.c:1533)
[ 1031.270081] copy_page_to_iter (mm/iov_iter.c:658)
[ 1031.270081] ? pipe_lock (fs/pipe.c:69)
[ 1031.270081] ? preempt_count_sub (kernel/sched/core.c:2617)
[ 1031.270081] ? vmsplice_to_user (fs/splice.c:1533)
[ 1031.270081] pipe_to_user (fs/splice.c:1535)
[ 1031.270081] __splice_from_pipe (fs/splice.c:770 fs/splice.c:886)
[ 1031.270081] vmsplice_to_user (fs/splice.c:1573)
[ 1031.270081] ? rcu_read_lock_held (kernel/rcu/update.c:168)
[ 1031.270081] SyS_vmsplice (include/linux/file.h:38 fs/splice.c:1657 fs/splice.c:1638)
[ 1031.270081] tracesys (arch/x86/kernel/entry_64.S:541)

And hangs:

[  212.010020] INFO: rcu_preempt detected stalls on CPUs/tasks:
[  212.010020]  Tasks blocked on level-1 rcu_node (CPUs 0-15):
[  212.010020]  8: (136 GPs behind) idle=2b9/140000000000000/0 softirq=4/4 last_accelerate: 0000/dda2, nonlazy_posted: 0, .D
[  212.010020]  9: (136 GPs behind) idle=92e/0/0 softirq=3/3 last_accelerate: 0000/dda2, nonlazy_posted: 0, .D
[  212.010020]  (detected by 1, t=6502 jiffies, g=4645, c=4644, q=0)
[  212.010020] Task dump for CPU 8:
[  212.010020] trinity-c350    R  running task    13000  9101   8424 0x00080006
[  212.010020]  ffff880520f47d98 0000000000000296 ffff8805230cfb38 ffffffffb750ba04
[  212.010020]  ffffffffb41bc165 ffff8805230cfb88 ffff8805230cfba0 ffff880520f47d80
[  212.010020]  ffff8805230cfb68 ffffffffb41bc165 ffff880520f47d80 ffff8805230c8800
[  212.010020] Call Trace:
[  212.010020] ? _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:117 kernel/locking/spinlock.c:159)
[  212.010020] ? finish_wait (include/linux/list.h:144 kernel/sched/wait.c:251)
[  212.010020] ? finish_wait (include/linux/list.h:144 kernel/sched/wait.c:251)
[  212.010020] ? shmem_fault (include/linux/spinlock.h:343 mm/shmem.c:1327)
[  212.010020] ? __wait_on_bit_lock (kernel/sched/wait.c:291)
[  212.010020] ? __do_fault (mm/memory.c:2713)
[  212.010020] ? do_shared_fault (mm/memory.c:2985 (discriminator 8))
[  212.010020] ? handle_mm_fault (mm/memory.c:3097 mm/memory.c:3225 mm/memory.c:3345 mm/memory.c:3374)
[  212.010020] ? __do_page_fault (arch/x86/mm/fault.c:1231)
[  212.010020] ? debug_smp_processor_id (lib/smp_processor_id.c:57)
[  212.010020] ? __tick_nohz_task_switch (./arch/x86/include/asm/paravirt.h:809 (discriminator 2) kernel/time/tick-sched.c:278 (discriminator 2))
[  212.010020] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[  212.010020] ? context_tracking_user_exit (kernel/context_tracking.c:184)
[  212.010020] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[  212.010020] ? trace_hardirqs_off_caller (kernel/locking/lockdep.c:2639 (discriminator 8))
[  212.010020] ? trace_do_page_fault (arch/x86/mm/fault.c:1314 include/linux/jump_label.h:115 include/linux/context_tracking_state.h:27 include/linux/context_tracking.h:45 arch/x86/mm/fault.c:1315)
[  212.010020] ? do_async_page_fault (arch/x86/kernel/kvm.c:279)
[  212.010020] ? async_page_fault (arch/x86/kernel/entry_64.S:1321)
[  212.010020] ? copy_user_generic_unrolled (arch/x86/lib/copy_user_64.S:167)
[  212.010020] ? SyS_getcwd (./arch/x86/include/asm/uaccess.h:731 fs/dcache.c:3200 fs/dcache.c:3164)
[  212.010020] ? tracesys (arch/x86/kernel/entry_64.S:541)
[  212.010020] ? tracesys (arch/x86/kernel/entry_64.S:541)


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/