lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20210308162128.9b4a7d4c1576a72fd4878bdb@linux-foundation.org>
Date:   Mon, 8 Mar 2021 16:21:28 -0800
From:   Andrew Morton <akpm@...ux-foundation.org>
To:     Minchan Kim <minchan@...nel.org>
Cc:     linux-mm <linux-mm@...ck.org>, LKML <linux-kernel@...r.kernel.org>,
        John Dias <joaodias@...gle.com>,
        Michal Hocko <mhocko@...e.com>,
        David Hildenbrand <david@...hat.com>,
        Jason Baron <jbaron@...mai.com>
Subject: Re: [PATCH v2] mm: page_alloc: dump migrate-failed pages

On Mon,  8 Mar 2021 12:20:47 -0800 Minchan Kim <minchan@...nel.org> wrote:

> alloc_contig_range is usually used on cma area or movable zone.
> It's critical if the page migration fails on those areas so
> dump more debugging message.
> 
> page refcount, mapcount with page flags on dump_page are
> helpful information to deduce the culprit. Furthermore,
> dump_page_owner was super helpful to find long term pinner
> who initiated the page allocation.
> 
> Admin could enable the dump like this(by default, disabled)
> 
> 	echo "func dump_migrate_failure_pages +p" > control
> 
> Admin could disable it.
> 
> 	echo "func dump_migrate_failure_pages =_" > control
> 
> ...
>
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8453,6 +8453,34 @@ static unsigned long pfn_max_align_up(unsigned long pfn)
>  				pageblock_nr_pages));
>  }
>  
> +#if defined(CONFIG_DYNAMIC_DEBUG) || \
> +	(defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))
> +static DEFINE_RATELIMIT_STATE(alloc_contig_ratelimit_state,
> +		DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST);
> +int alloc_contig_ratelimit(void)
> +{
> +	return __ratelimit(&alloc_contig_ratelimit_state);
> +}

Wow, that's an eyesore.  We're missing helpers in the ratelimit code. 
Can we do something like

/* description goes here */
#define RATELIMIT2(interval, burst)
({
	static DEFINE_RATELIMIT_STATE(_rs, interval, burst);

	__ratelimit(_rs);
})

#define RATELIMIT()
	RATELIMIT2(DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST)

> +void dump_migrate_failure_pages(struct list_head *page_list)
> +{
> +	DEFINE_DYNAMIC_DEBUG_METADATA(descriptor,
> +			"migrate failure");
> +	if (DYNAMIC_DEBUG_BRANCH(descriptor) &&
> +			alloc_contig_ratelimit()) {
> +		struct page *page;
> +
> +		WARN(1, "failed callstack");
> +		list_for_each_entry(page, page_list, lru)
> +			dump_page(page, "migration failure");
> +	}
> +}

Then we can simply do

	if (DYNAMIC_DEBUG_BRANCH(descriptor) && RATELIMIT())


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ