lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 18 Nov 2015 16:34:30 +0100
From:	Vlastimil Babka <vbabka@...e.cz>
To:	Joonsoo Kim <js1304@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Michal Nazarewicz <mina86@...a86.com>,
	Minchan Kim <minchan@...nel.org>, Mel Gorman <mgorman@...e.de>,
	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	linux-api@...r.kernel.org, Joonsoo Kim <iamjoonsoo.kim@....com>
Subject: Re: [PATCH 2/2] mm/page_ref: add tracepoint to track down page
 reference manipulation

On 11/09/2015 08:23 AM, Joonsoo Kim wrote:
> CMA allocation should be guaranteed to succeed by definition, but,
> unfortunately, it would be failed sometimes. It is hard to track down
> the problem, because it is related to page reference manipulation and
> we don't have any facility to analyze it.

Reminds me of the PeterZ's VM_PINNED patchset. What happened to it?
https://lwn.net/Articles/600502/

> This patch adds tracepoints to track down page reference manipulation.
> With it, we can find exact reason of failure and can fix the problem.
> Following is an example of tracepoint output.
> 
> <...>-9018  [004]    92.678375: page_ref_set:         pfn=0x17ac9 flags=0x0 count=1 mapcount=0 mapping=(nil) mt=4 val=1
> <...>-9018  [004]    92.678378: kernel_stack:
>  => get_page_from_freelist (ffffffff81176659)
>  => __alloc_pages_nodemask (ffffffff81176d22)
>  => alloc_pages_vma (ffffffff811bf675)
>  => handle_mm_fault (ffffffff8119e693)
>  => __do_page_fault (ffffffff810631ea)
>  => trace_do_page_fault (ffffffff81063543)
>  => do_async_page_fault (ffffffff8105c40a)
>  => async_page_fault (ffffffff817581d8)
> [snip]
> <...>-9018  [004]    92.678379: page_ref_mod:         pfn=0x17ac9 flags=0x40048 count=2 mapcount=1 mapping=0xffff880015a78dc1 mt=4 val=1
> [snip]
> ...
> ...
> <...>-9131  [001]    93.174468: test_pages_isolated:  start_pfn=0x17800 end_pfn=0x17c00 fin_pfn=0x17ac9 ret=fail
> [snip]
> <...>-9018  [004]    93.174843: page_ref_mod_and_test: pfn=0x17ac9 flags=0x40068 count=0 mapcount=0 mapping=0xffff880015a78dc1 mt=4 val=-1 ret=1
>  => release_pages (ffffffff8117c9e4)
>  => free_pages_and_swap_cache (ffffffff811b0697)
>  => tlb_flush_mmu_free (ffffffff81199616)
>  => tlb_finish_mmu (ffffffff8119a62c)
>  => exit_mmap (ffffffff811a53f7)
>  => mmput (ffffffff81073f47)
>  => do_exit (ffffffff810794e9)
>  => do_group_exit (ffffffff81079def)
>  => SyS_exit_group (ffffffff81079e74)
>  => entry_SYSCALL_64_fastpath (ffffffff817560b6)
> 
> This output shows that problem comes from exit path. In exit path,
> to improve performance, pages are not freed immediately. They are gathered
> and processed by batch. During this process, migration cannot be possible
> and CMA allocation is failed. This problem is hard to find without this
> page reference tracepoint facility.

Yeah but when you realized it was this problem, what was the fix? Probably not
remove batching from exit path? Shouldn't CMA in this case just try waiting for
the pins to go away, which would eventually happen? And for long-term pins,
VM_PINNED would make sure the pages are migrated away from CMA pageblocks first?

So I'm worried that this is quite nontrivial change for a very specific usecase.

> Enabling this feature bloat kernel text 20 KB in my configuration.

It's not just that, see below.

[...]


>  static inline int page_ref_freeze(struct page *page, int count)
>  {
> -	return likely(atomic_cmpxchg(&page->_count, count, 0) == count);
> +	int ret = likely(atomic_cmpxchg(&page->_count, count, 0) == count);

The "likely" mean makes no sense anymore, doe it?

> diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
> index 957d3da..71d2399 100644
> --- a/mm/Kconfig.debug
> +++ b/mm/Kconfig.debug
> @@ -28,3 +28,7 @@ config DEBUG_PAGEALLOC
>  
>  config PAGE_POISONING
>  	bool
> +
> +config DEBUG_PAGE_REF
> +	bool "Enable tracepoint to track down page reference manipulation"

So you should probably state the costs. Which is the extra memory, and also that
all the page ref manipulations are now turned to function calls, even if the
tracepoints are disabled. Patch 1 didn't change that many callsites, so maybe it
would be feasible to have the tracepoints inline, where being disabled has
near-zero overhead?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ