lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151020013834.GA2941@bbox>
Date:	Tue, 20 Oct 2015 10:38:54 +0900
From:	Minchan Kim <minchan@...nel.org>
To:	Andrew Morton <akpm@...ux-foundation.org>,
	"Kirill A. Shutemov" <kirill@...temov.name>
Cc:	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	Hugh Dickins <hughd@...gle.com>,
	Rik van Riel <riel@...hat.com>, Mel Gorman <mgorman@...e.de>,
	Michal Hocko <mhocko@...e.cz>,
	Johannes Weiner <hannes@...xchg.org>,
	"Kirill A. Shutemov" <kirill@...temov.name>,
	Vlastimil Babka <vbabka@...e.cz>
Subject: Re: [PATCH 0/5] MADV_FREE refactoring and fix KSM page

On Mon, Oct 19, 2015 at 07:01:50PM +0900, Minchan Kim wrote:
> On Mon, Oct 19, 2015 at 03:31:42PM +0900, Minchan Kim wrote:
> > Hello, it's too late since I sent previos patch.
> > https://lkml.org/lkml/2015/6/3/37
> > 
> > This patch is alomost new compared to previos approach.
> > I think this is more simple, clear and easy to review.
> > 
> > One thing I should notice is that I have tested this patch
> > and couldn't find any critical problem so I rebased patchset
> > onto recent mmotm(ie, mmotm-2015-10-15-15-20) to send formal
> > patchset. Unfortunately, I start to see sudden discarding of
> > the page we shouldn't do. IOW, application's valid anonymous page
> > was disappeared suddenly.
> > 
> > When I look through THP changes, I think we could lose
> > dirty bit of pte between freeze_page and unfreeze_page
> > when we mark it as migration entry and restore it.
> > So, I added below simple code without enough considering
> > and cannot see the problem any more.
> > I hope it's good hint to find right fix this problem.
> > 
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index d5ea516ffb54..e881c04f5950 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -3138,6 +3138,9 @@ static void unfreeze_page_vma(struct vm_area_struct *vma, struct page *page,
> >  		if (is_write_migration_entry(swp_entry))
> >  			entry = maybe_mkwrite(entry, vma);
> >  
> > +		if (PageDirty(page))
> > +			SetPageDirty(page);
> 
> The condition of PageDirty was typo. I didn't add the condition.
> Just added.
> 
>                 SetPageDirty(page);

For the first step to find this bug, I removed all MADV_FREE related
code in mmotm-2015-10-15-15-20. IOW, git checkout 54bad5da4834
(arm64: add pmd_[dirty|mkclean] for THP) so the tree doesn't have
any core code of MADV_FREE.

I tested following workloads in my KVM machine.

0. make memcg
1. limit memcg
2. fork several processes
3. each process allocates THP page and fill
4. increase limit of the memcg to swapoff successfully
5. swapoff
6. kill all of processes
7. goto 1

Within a few hours, I encounter following bug.
Attached detailed boot log and dmesg result.


Initializing cgroup subsys cpu
Command line: hung_task_panic=1 earlyprintk=ttyS0,115200 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic console=ttyS0,115200 console=tty0 earlyprintk=ttyS0 ignore_loglevel ftrace_dump_on_oops vga=normal root=/dev/vda1 rw
KERNEL supported cpus:
  Intel GenuineIntel
x86/fpu: Legacy x87 FPU detected.
x86/fpu: Using 'lazy' FPU context switches.
e820: BIOS-provided physical RAM map:
BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
BIOS-e820: [mem 0x0000000000100000-0x00000000bfffbfff] usable
BIOS-e820: [mem 0x00000000bfffc000-0x00000000bfffffff] reserved
BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved

<snip>

Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
Adding 4191228k swap on /dev/vda5.  Priority:-1 extents:1 across:4191228k FS
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff810782a9>] down_read_trylock+0x9/0x30
PGD 0 
Oops: 0000 [#1] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 26445 Comm: sh Not tainted 4.3.0-rc5-mm1-diet-meta+ #1545
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff8800b9af3480 ti: ffff88007fea0000 task.ti: ffff88007fea0000
RIP: 0010:[<ffffffff810782a9>]  [<ffffffff810782a9>] down_read_trylock+0x9/0x30
RSP: 0018:ffff88007fea3648  EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffffea0002324900 RCX: ffff88007fea37e8
RDX: 0000000000000000 RSI: ffff88007fea36e8 RDI: 0000000000000008
RBP: ffff88007fea3648 R08: ffffffff818446a0 R09: ffff8800b9af4c80
R10: 0000000000000216 R11: 0000000000000001 R12: ffff88007f58d6e1
R13: ffff88007f58d6e0 R14: 0000000000000008 R15: 0000000000000001
FS:  00007f0993e78740(0000) GS:ffff8800bfa20000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 000000007edee000 CR4: 00000000000006a0
Stack:
 ffff88007fea3678 ffffffff81124ff0 ffffea0002324900 ffff88007fea36e8
 ffff88009ffe8400 0000000000000000 ffff88007fea36c0 ffffffff81125733
 ffff8800bfa34540 ffffffff8105dc9d ffffea0002324900 ffff88007fea37e8
Call Trace:
 [<ffffffff81124ff0>] page_lock_anon_vma_read+0x60/0x180
 [<ffffffff81125733>] rmap_walk+0x1b3/0x3f0
 [<ffffffff8105dc9d>] ? finish_task_switch+0x5d/0x1f0
 [<ffffffff81125b13>] page_referenced+0x1a3/0x220
 [<ffffffff81123e30>] ? __page_check_address+0x1a0/0x1a0
 [<ffffffff81124f90>] ? page_get_anon_vma+0xd0/0xd0
 [<ffffffff81123820>] ? anon_vma_ctor+0x40/0x40
 [<ffffffff8110087b>] shrink_page_list+0x5ab/0xde0
 [<ffffffff8110174c>] shrink_inactive_list+0x18c/0x4b0
 [<ffffffff811023bd>] shrink_lruvec+0x59d/0x740
 [<ffffffff811025f0>] shrink_zone+0x90/0x250
 [<ffffffff811028dd>] do_try_to_free_pages+0x12d/0x3b0
 [<ffffffff81102d3d>] try_to_free_mem_cgroup_pages+0x9d/0x120
 [<ffffffff811496c3>] try_charge+0x163/0x700
 [<ffffffff81149cb4>] mem_cgroup_do_precharge+0x54/0x70
 [<ffffffff81149e45>] mem_cgroup_can_attach+0x175/0x1b0
 [<ffffffff811b2c57>] ? kernfs_iattrs.isra.6+0x37/0xd0
 [<ffffffff81148e70>] ? get_mctgt_type+0x320/0x320
 [<ffffffff810a9d29>] cgroup_migrate+0x149/0x440
 [<ffffffff810aa60c>] cgroup_attach_task+0x7c/0xe0
 [<ffffffff810aa904>] __cgroup_procs_write.isra.33+0x1d4/0x2b0
 [<ffffffff810aaa10>] cgroup_tasks_write+0x10/0x20
 [<ffffffff810a6238>] cgroup_file_write+0x38/0xf0
 [<ffffffff811b54ad>] kernfs_fop_write+0x11d/0x170
 [<ffffffff81153918>] __vfs_write+0x28/0xe0
 [<ffffffff8116e614>] ? __fd_install+0x24/0xc0
 [<ffffffff810784a1>] ? percpu_down_read+0x21/0x50
 [<ffffffff81153e91>] vfs_write+0xa1/0x170
 [<ffffffff81154716>] SyS_write+0x46/0xa0
 [<ffffffff81420a17>] entry_SYSCALL_64_fastpath+0x12/0x6a
Code: 5e 82 3a 00 48 83 c4 08 5b 5d c3 48 89 45 f0 e8 9b 6a 3a 00 48 8b 45 f0 eb df 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 <48> 8b 07 48 89 c2 48 83 c2 01 7e 07 f0 48 0f b1 17 75 f0 48 f7 
RIP  [<ffffffff810782a9>] down_read_trylock+0x9/0x30
 RSP <ffff88007fea3648>
CR2: 0000000000000008
BUG: unable to handle kernel ---[ end trace e81a82c8122b447d ]---
Kernel panic - not syncing: Fatal exception

NULL pointer dereference at 0000000000000008
IP: [<ffffffff810782a9>] down_read_trylock+0x9/0x30
PGD 0 
Oops: 0000 [#2] SMP 
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 10 PID: 59 Comm: khugepaged Tainted: G      D         4.3.0-rc5-mm1-diet-meta+ #1545
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff8800b9851a40 ti: ffff8800b985c000 task.ti: ffff8800b985c000
RIP: 0010:[<ffffffff810782a9>]  [<ffffffff810782a9>] down_read_trylock+0x9/0x30
RSP: 0018:ffff8800b985f778  EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffffea0002321800 RCX: ffff8800b985f918
RDX: 0000000000000000 RSI: ffff8800b985f818 RDI: 0000000000000008
RBP: ffff8800b985f778 R08: ffffffff818446a0 R09: ffff8800b9853240
R10: 000000000000ba03 R11: 0000000000000001 R12: ffff88007f58d6e1
R13: ffff88007f58d6e0 R14: 0000000000000008 R15: 0000000000000001
FS:  0000000000000000(0000) GS:ffff8800bfb40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 0000000001808000 CR4: 00000000000006a0
Stack:
 ffff8800b985f7a8 ffffffff81124ff0 ffffea0002321800 ffff8800b985f818
 ffff88009ffe8400 0000000000000000 ffff8800b985f7f0 ffffffff81125733
 ffff8800bfb54540 ffffffff8105dc9d ffffea0002321800 ffff8800b985f918
Call Trace:
 [<ffffffff81124ff0>] page_lock_anon_vma_read+0x60/0x180
 [<ffffffff81125733>] rmap_walk+0x1b3/0x3f0
 [<ffffffff8105dc9d>] ? finish_task_switch+0x5d/0x1f0
 [<ffffffff81125b13>] page_referenced+0x1a3/0x220
 [<ffffffff81123e30>] ? __page_check_address+0x1a0/0x1a0
 [<ffffffff81124f90>] ? page_get_anon_vma+0xd0/0xd0
 [<ffffffff81123820>] ? anon_vma_ctor+0x40/0x40
 [<ffffffff8110087b>] shrink_page_list+0x5ab/0xde0
 [<ffffffff8110174c>] shrink_inactive_list+0x18c/0x4b0
 [<ffffffff811023bd>] shrink_lruvec+0x59d/0x740
 [<ffffffff811025f0>] shrink_zone+0x90/0x250
 [<ffffffff811028dd>] do_try_to_free_pages+0x12d/0x3b0
 [<ffffffff81102d3d>] try_to_free_mem_cgroup_pages+0x9d/0x120
 [<ffffffff811496c3>] try_charge+0x163/0x700
 [<ffffffff8141d1f3>] ? schedule+0x33/0x80
 [<ffffffff8114d45f>] mem_cgroup_try_charge+0x9f/0x1d0
 [<ffffffff811434bc>] khugepaged+0x7cc/0x1ac0
 [<ffffffff81066e01>] ? hrtick_update+0x1/0x70
 [<ffffffff81072430>] ? prepare_to_wait_event+0xf0/0xf0
 [<ffffffff81142cf0>] ? total_mapcount+0x70/0x70
 [<ffffffff81056cd9>] kthread+0xc9/0xe0
 [<ffffffff81056c10>] ? kthread_park+0x60/0x60
 [<ffffffff81420d6f>] ret_from_fork+0x3f/0x70
 [<ffffffff81056c10>] ? kthread_park+0x60/0x60
Code: 5e 82 3a 00 48 83 c4 08 5b 5d c3 48 89 45 f0 e8 9b 6a 3a 00 48 8b 45 f0 eb df 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 <48> 8b 07 48 89 c2 48 83 c2 01 7e 07 f0 48 0f b1 17 75 f0 48 f7 
RIP  [<ffffffff810782a9>] down_read_trylock+0x9/0x30
 RSP <ffff8800b985f778>
CR2: 0000000000000008
---[ end trace e81a82c8122b447e ]---
Shutting down cpus with NMI
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled


View attachment "test_bug.log" of type "text/plain" (46078 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ