linux-kernel - Re: [RFC PATCH] mm/swap: fix system stuck due to infinite loop

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-Id: <20210402174447.2abccc77cdca5cad67756d55@linux-foundation.org>
Date:   Fri, 2 Apr 2021 17:44:47 -0700
From:   Andrew Morton <akpm@...ux-foundation.org>
To:     Stillinux <stillinux@...il.com>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        liuzhengyuan@...inos.cn, liuyun01@...inos.cn,
        Johannes Weiner <hannes@...xchg.org>,
        Hugh Dickins <hughd@...gle.com>
Subject: Re: [RFC PATCH] mm/swap: fix system stuck due to infinite loop

On Fri, 2 Apr 2021 15:03:37 +0800 Stillinux <stillinux@...il.com> wrote:

> In the case of high system memory and load pressure, we ran ltp test
> and found that the system was stuck, the direct memory reclaim was
> all stuck in io_schedule, the waiting request was stuck in the blk_plug
> flow of one process, and this process fell into an infinite loop.
> not do the action of brushing out the request.
> 
> The call flow of this process is swap_cluster_readahead.
> Use blk_start/finish_plug for blk_plug operation,
> flow swap_cluster_readahead->__read_swap_cache_async->swapcache_prepare.
> When swapcache_prepare return -EEXIST, it will fall into an infinite loop,
> even if cond_resched is called, but according to the schedule,
> sched_submit_work will be based on tsk->state, and will not flash out
> the blk_plug request, so will hang io, causing the overall system  hang.
> 
> For the first time involving the swap part, there is no good way to fix
> the problem from the fundamental problem. In order to solve the
> engineering situation, we chose to make swap_cluster_readahead aware of
> the memory pressure situation as soon as possible, and do io_schedule to
> flush out the blk_plug request, thereby changing the allocation flag in
> swap_readpage to GFP_NOIO , No longer do the memory reclaim of flush io.
> Although system operating normally, but not the most fundamental way.
> 

Thanks.

I'm not understanding why swapcache_prepare() repeatedly returns
-EEXIST in this situation?

And how does the switch to GFP_NOIO fix this?  Simply by avoiding
direct reclaim altogether?

> ---
>  mm/page_io.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/page_io.c b/mm/page_io.c
> index c493ce9ebcf5..87392ffabb12 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -403,7 +403,7 @@ int swap_readpage(struct page *page, bool synchronous)
>  	}
> 
>  	ret = 0;
> -	bio = bio_alloc(GFP_KERNEL, 1);
> +	bio = bio_alloc(GFP_NOIO, 1);
>  	bio_set_dev(bio, sis->bdev);
>  	bio->bi_opf = REQ_OP_READ;
>  	bio->bi_iter.bi_sector = swap_page_sector(page);