linux-kernel - Re: mm: shm: hang in shmem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <53AC383F.3010007@oracle.com>
Date:	Thu, 26 Jun 2014 11:11:59 -0400
From:	Sasha Levin <sasha.levin@...cle.com>
To:	Hugh Dickins <hughd@...gle.com>, Vlastimil Babka <vbabka@...e.cz>
CC:	Konstantin Khlebnikov <koct9i@...il.com>,
	Dave Jones <davej@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
	linux-fsdevel@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>
Subject: Re: mm: shm: hang in shmem_fallocate

On 06/25/2014 06:36 PM, Hugh Dickins wrote:
> Sasha, may I trespass on your time, and ask you to revert the previous
> patch from your tree, and give this patch below a try?  I am very
> interested to learn if in fact it fixes it for you (as it did for me).

Hi Hugh,

Happy to help, and as I often do I will answer with a question.

I've observed two different issues after reverting the original fix and
applying this new patch. Both of them seem semi-related, but I'm not sure.

First, this:

[  681.267487] BUG: unable to handle kernel paging request at ffffea0003480048
[  681.268621] IP: zap_pte_range (mm/memory.c:1132)
[  681.269335] PGD 37fcc067 PUD 37fcb067 PMD 0
[  681.269972] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[  681.270952] Dumping ftrace buffer:
[  681.270952]    (ftrace buffer empty)
[  681.270952] Modules linked in:
[  681.270952] CPU: 7 PID: 1952 Comm: trinity-c29 Not tainted 3.16.0-rc2-next-20140625-s
asha-00025-g2e02e05-dirty #730
[  681.270952] task: ffff8803e6f58000 ti: ffff8803df050000 task.ti: ffff8803df050000
[  681.270952] RIP: zap_pte_range (mm/memory.c:1132)
[  681.270952] RSP: 0018:ffff8803df053c58  EFLAGS: 00010246
[  681.270952] RAX: ffffea0003480040 RBX: ffff8803edae7a70 RCX: 0000000003480040
[  681.270952] RDX: 00000000d2001730 RSI: 0000000000000000 RDI: 00000000d2001730
[  681.270952] RBP: ffff8803df053cf8 R08: ffff88000015cc00 R09: 0000000000000000
[  681.270952] R10: 0000000000000001 R11: 0000000000000000 R12: ffffea0003480040
[  681.270952] R13: ffff8803df053de8 R14: 00007fc15014f000 R15: 00007fc15014e000
[  681.270952] FS:  00007fc15031b700(0000) GS:ffff8801ece00000(0000) knlGS:0000000000000
000
[  681.270952] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  681.270952] CR2: ffffea0003480048 CR3: 000000001a02e000 CR4: 00000000000006a0
[  681.270952] Stack:
[  681.270952]  ffff8803df053de8 00000000d2001000 00000000d2001fff ffff8803e6f58000
[  681.270952]  0000000000000000 0000000000000001 ffff880404dd8400 ffff8803e6e31900
[  681.270952]  00000000d2001730 ffff88000015cc00 0000000000000000 ffff8804078f8000
[  681.270952] Call Trace:
[  681.270952] unmap_single_vma (mm/memory.c:1256 mm/memory.c:1277 mm/memory.c:1301 mm/m
emory.c:1346)
[  681.270952] unmap_vmas (mm/memory.c:1375 (discriminator 1))
[  681.270952] exit_mmap (mm/mmap.c:2797)
[  681.270952] ? preempt_count_sub (kernel/sched/core.c:2606)
[  681.270952] mmput (kernel/fork.c:638)
[  681.270952] do_exit (kernel/exit.c:744)
[  681.270952] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[  681.270952] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2557 kernel/locking/
lockdep.c:2599)
[  681.270952] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607)
[  681.270952] do_group_exit (kernel/exit.c:884)
[  681.270952] SyS_exit_group (kernel/exit.c:895)
[  681.270952] tracesys (arch/x86/kernel/entry_64.S:542)
[ 681.270952] Code: e8 cf 39 25 03 49 8b 4c 24 10 48 39 c8 74 1c 48 8b 7d b8 48 c1 e1 0c
 48 89 da 48 83 c9 40 4c 89 fe e8 e5 db ff ff 0f 1f 44 00 00 <41> f6 44 24 08 01 74 08 83 6d c8 01 eb 33 66 90 f6 45 a0 40 74
All code
========
   0:   e8 cf 39 25 03          callq  0x32539d4
   5:   49 8b 4c 24 10          mov    0x10(%r12),%rcx
   a:   48 39 c8                cmp    %rcx,%rax
   d:   74 1c                   je     0x2b
   f:   48 8b 7d b8             mov    -0x48(%rbp),%rdi
  13:   48 c1 e1 0c             shl    $0xc,%rcx
  17:   48 89 da                mov    %rbx,%rdx
  1a:   48 83 c9 40             or     $0x40,%rcx
  1e:   4c 89 fe                mov    %r15,%rsi
  21:   e8 e5 db ff ff          callq  0xffffffffffffdc0b
  26:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  2b:*  41 f6 44 24 08 01       testb  $0x1,0x8(%r12)           <-- trapping instruction
  31:   74 08                   je     0x3b
  33:   83 6d c8 01             subl   $0x1,-0x38(%rbp)
  37:   eb 33                   jmp    0x6c
  39:   66 90                   xchg   %ax,%ax
  3b:   f6 45 a0 40             testb  $0x40,-0x60(%rbp)
  3f:   74 00                   je     0x41

Code starting with the faulting instruction
===========================================
   0:   41 f6 44 24 08 01       testb  $0x1,0x8(%r12)
   6:   74 08                   je     0x10
   8:   83 6d c8 01             subl   $0x1,-0x38(%rbp)
   c:   eb 33                   jmp    0x41
   e:   66 90                   xchg   %ax,%ax
  10:   f6 45 a0 40             testb  $0x40,-0x60(%rbp)
  14:   74 00                   je     0x16
[  681.270952] RIP zap_pte_range (mm/memory.c:1132)
[  681.270952]  RSP <ffff8803df053c58>
[  681.270952] CR2: ffffea0003480048

And a longer lockup that shows a few shmem_fallocate hanging, but they don't seem to be
the main reason for the hang (log it pretty long, attached).


Thanks,
Sasha

Download attachment "out.txt.gz" of type "application/gzip" (235278 bytes)