[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200110092256.GN3466@techsingularity.net>
Date: Fri, 10 Jan 2020 09:22:56 +0000
From: Mel Gorman <mgorman@...hsingularity.net>
To: Cong Wang <xiyou.wangcong@...il.com>
Cc: linux-kernel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>,
Michal Hocko <mhocko@...e.com>, linux-mm@...ck.org
Subject: Re: [PATCH] mm: avoid blocking lock_page() in kcompactd
On Thu, Jan 09, 2020 at 02:56:46PM -0800, Cong Wang wrote:
> We observed kcompactd hung at __lock_page():
>
> INFO: task kcompactd0:57 blocked for more than 120 seconds.
> Not tainted 4.19.56.x86_64 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kcompactd0 D 0 57 2 0x80000000
> Call Trace:
> ? __schedule+0x236/0x860
> schedule+0x28/0x80
> io_schedule+0x12/0x40
> __lock_page+0xf9/0x120
> ? page_cache_tree_insert+0xb0/0xb0
> ? update_pageblock_skip+0xb0/0xb0
> migrate_pages+0x88c/0xb90
> ? isolate_freepages_block+0x3b0/0x3b0
> compact_zone+0x5f1/0x870
> kcompactd_do_work+0x130/0x2c0
> ? __switch_to_asm+0x35/0x70
> ? __switch_to_asm+0x41/0x70
> ? kcompactd_do_work+0x2c0/0x2c0
> ? kcompactd+0x73/0x180
> kcompactd+0x73/0x180
> ? finish_wait+0x80/0x80
> kthread+0x113/0x130
> ? kthread_create_worker_on_cpu+0x50/0x50
> ret_from_fork+0x35/0x40
>
> which faddr2line maps to:
>
> migrate_pages+0x88c/0xb90:
> lock_page at include/linux/pagemap.h:483
> (inlined by) __unmap_and_move at mm/migrate.c:1024
> (inlined by) unmap_and_move at mm/migrate.c:1189
> (inlined by) migrate_pages at mm/migrate.c:1419
>
> Sometimes kcompactd eventually got out of this situation, sometimes not.
>
> I think for memory compaction, it is a best effort to migrate the pages,
> so it doesn't have to wait for I/O to complete. It is fine to call
> trylock_page() here, which is pretty much similar to
> buffer_migrate_lock_buffers().
>
> Given MIGRATE_SYNC_LIGHT is used on compaction path, just relax the
> check for it.
>
Is this a single page being locked for a long time or multiple pages
being locked without reaching a reschedule point?
If it's a single page being locked, it's important to identify what held
page lock for 2 minutes because that is potentially a missing
unlock_page. The kernel in question is old -- 4.19.56. Are there any
other modifications to that kernel?
--
Mel Gorman
SUSE Labs
Powered by blists - more mailing lists