lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 9 May 2013 17:13:27 -0500
From:	Seth Jennings <sjenning@...ux.vnet.ibm.com>
To:	Dave Hansen <dave@...1.net>
Cc:	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	akpm@...ux-foundation.org, mgorman@...e.de,
	tim.c.chen@...ux.intel.com
Subject: Re: [RFC][PATCH 5/7] create __remove_mapping_batch()

On Tue, May 07, 2013 at 02:20:01PM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@...ux.intel.com>
> 
> __remove_mapping_batch() does logically the same thing as
> __remove_mapping().
> 
> We batch like this so that several pages can be freed with a
> single mapping->tree_lock acquisition/release pair.  This reduces
> the number of atomic operations and ensures that we do not bounce
> cachelines around.
> 
> It has shown some substantial performance benefits on
> microbenchmarks.
> 
> Signed-off-by: Dave Hansen <dave.hansen@...ux.intel.com>
> ---
> 
>  linux.git-davehans/mm/vmscan.c |   50 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 50 insertions(+)
> 
> diff -puN mm/vmscan.c~create-remove_mapping_batch mm/vmscan.c
> --- linux.git/mm/vmscan.c~create-remove_mapping_batch	2013-05-07 14:00:01.432361260 -0700
> +++ linux.git-davehans/mm/vmscan.c	2013-05-07 14:19:32.341148892 -0700
> @@ -555,6 +555,56 @@ int remove_mapping(struct address_space
>  	return 0;
>  }
> 
> +/*
> + * pages come in here (via remove_list) locked and leave unlocked
> + * (on either ret_pages or free_pages)
> + *
> + * We do this batching so that we free batches of pages with a
> + * single mapping->tree_lock acquisition/release.  This optimization
> + * only makes sense when the pages on remove_list all share a
> + * page->mapping.  If this is violated you will BUG_ON().
> + */
> +static int __remove_mapping_batch(struct list_head *remove_list,
> +				  struct list_head *ret_pages,
> +				  struct list_head *free_pages)
> +{
> +	int nr_reclaimed = 0;
> +	struct address_space *mapping;
> +	struct page *page;
> +	LIST_HEAD(need_free_mapping);
> +
> +	if (list_empty(remove_list))
> +		return 0;
> +
> +	mapping = lru_to_page(remove_list)->mapping;

This doesn't work for pages in the swap cache as mapping is overloaded to
hold... something else that I can't remember of the top of my head.  Anyway,
this happens:

[   70.027984] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
[   70.028010] IP: [<ffffffff81077de8>] __lock_acquire.isra.24+0x188/0xd10
[   70.028010] PGD 1ab99067 PUD 671e5067 PMD 0 
[   70.028010] Oops: 0000 [#1] PREEMPT SMP 
[   70.028010] Modules linked in:
[   70.028010] CPU: 1 PID: 11494 Comm: cc1 Not tainted 3.9.0+ #6
[   70.028010] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[   70.028010] task: ffff88007c708f70 ti: ffff88001aa28000 task.ti: ffff88001aa28000
[   70.028010] RIP: 0010:[<ffffffff81077de8>]  [<ffffffff81077de8>] __lock_acquire.isra.24+0x188/0xd10
[   70.028010] RSP: 0000:ffff88001aa29658  EFLAGS: 00010097
[   70.028010] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[   70.028010] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[   70.028010] RBP: ffff88001aa296c8 R08: 0000000000000001 R09: 0000000000000000
[   70.028010] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007c708f70
[   70.028010] R13: 0000000000000001 R14: 0000000000000030 R15: ffff88001aa29758
[   70.028010] FS:  00007fe676f32700(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000
[   70.028010] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   70.028010] CR2: 0000000000000038 CR3: 000000001b7ba000 CR4: 00000000000006a0
[   70.028010] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   70.028010] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   70.028010] Stack:
[   70.028010]  0000000000000001 ffffffff8165d910 ffff88001aa296e8 ffffffff8107e925
[   70.028010]  ffff88001aa296a8 0000000000000000 0000000000000002 0000000000000046
[   70.028010]  dead000000200200 ffff88007c708f70 0000000000000046 0000000000000000
[   70.028010] Call Trace:
[   70.028010]  [<ffffffff8107e925>] ? smp_call_function_single+0xd5/0x180
[   70.028010]  [<ffffffff81078ea2>] lock_acquire+0x52/0x70
[   70.028010]  [<ffffffff810b7468>] ? __remove_mapping_batch+0x48/0x140
[   70.028010]  [<ffffffff813b3cf7>] _raw_spin_lock_irq+0x37/0x50
[   70.028010]  [<ffffffff810b7468>] ? __remove_mapping_batch+0x48/0x140
[   70.028010]  [<ffffffff810b7468>] __remove_mapping_batch+0x48/0x140
[   70.028010]  [<ffffffff810b88b0>] shrink_page_list+0x680/0x9f0
[   70.028010]  [<ffffffff810b910f>] shrink_inactive_list+0x13f/0x380
[   70.028010]  [<ffffffff810b9590>] shrink_lruvec+0x240/0x4e0
[   70.028010]  [<ffffffff810b9896>] shrink_zone+0x66/0x1a0
[   70.028010]  [<ffffffff810ba43b>] do_try_to_free_pages+0xeb/0x570
[   70.028010]  [<ffffffff810eba29>] ? lookup_page_cgroup_used+0x9/0x20
[   70.028010]  [<ffffffff810ba9ef>] try_to_free_pages+0x9f/0xc0
[   70.028010]  [<ffffffff810b1357>] __alloc_pages_nodemask+0x5a7/0x970
[   70.028010]  [<ffffffff810cb4fe>] handle_pte_fault+0x65e/0x880
[   70.028010]  [<ffffffff810cca19>] handle_mm_fault+0x139/0x1e0
[   70.028010]  [<ffffffff81027920>] __do_page_fault+0x160/0x460
[   70.028010]  [<ffffffff811121f1>] ? mntput+0x21/0x30
[   70.028010]  [<ffffffff810f56c1>] ? __fput+0x191/0x250
[   70.028010]  [<ffffffff81212bc9>] ? lockdep_sys_exit_thunk+0x35/0x67
[   70.028010]  [<ffffffff81027c49>] do_page_fault+0x9/0x10
[   70.028010]  [<ffffffff813b4cb2>] page_fault+0x22/0x30
[   70.028010] Code: 48 c7 c1 65 1b 50 81 48 c7 c2 3d 07 50 81 31 c0 be fb 0b 00 00 48 c7 c7 a8 58 50 81 e8 62 89 fb ff e9 d7 01 00 00 0f 1f 44 00 00 <4d> 8b 6c c6 08 4d 85 ed 0f 84 cc fe ff ff f0 41 ff 85 98 01 00 
[   70.028010] RIP  [<ffffffff81077de8>] __lock_acquire.isra.24+0x188/0xd10
[   70.028010]  RSP <ffff88001aa29658>
[   70.028010] CR2: 0000000000000038
[   70.028010] ---[ end trace 94be6276375f6199 ]---

The solution is to use page_mapping() which has logic to handle swap cache page
mapping.

I got by it with:

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 43b4da8..897eb5f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -576,13 +576,15 @@ static int __remove_mapping_batch(struct list_head *remove_list,
        if (list_empty(remove_list))
                return 0;
 
-       mapping = lru_to_page(remove_list)->mapping;
+       page = lru_to_page(remove_list);
+       mapping = page_mapping(page);
+       BUG_ON(mapping == NULL);
        spin_lock_irq(&mapping->tree_lock);
        while (!list_empty(remove_list)) {
                int freed;
                page = lru_to_page(remove_list);
                BUG_ON(!PageLocked(page));
-               BUG_ON(page->mapping != mapping);
+               BUG_ON(page_mapping(page) != mapping);
                list_del(&page->lru);
 
                freed = __remove_mapping_nolock(mapping, page);

Seth

> +	spin_lock_irq(&mapping->tree_lock);
> +	while (!list_empty(remove_list)) {
> +		int freed;
> +		page = lru_to_page(remove_list);
> +		BUG_ON(!PageLocked(page));
> +		BUG_ON(page->mapping != mapping);
> +		list_del(&page->lru);
> +
> +		freed = __remove_mapping_nolock(mapping, page);
> +		if (freed) {
> +			list_add(&page->lru, &need_free_mapping);
> +		} else {
> +			unlock_page(page);
> +			list_add(&page->lru, ret_pages);
> +		}
> +	}
> +	spin_unlock_irq(&mapping->tree_lock);
> +
> +	while (!list_empty(&need_free_mapping)) {
> +		page = lru_to_page(&need_free_mapping);
> +		list_move(&page->list, free_pages);
> +		free_mapping_page(mapping, page);
> +		unlock_page(page);
> +		nr_reclaimed++;
> +	}
> +	return nr_reclaimed;
> +}
> +
>  /**
>   * putback_lru_page - put previously isolated page onto appropriate LRU list
>   * @page: page to be put back to appropriate lru list
> _
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@...ck.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a>
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ