linux-kernel - Re: [RFC] mm/vmscan.c: avoid possible long latency caused by too_many

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a085478d-5118-cdff-c611-1649fce7a650@linux.intel.com>
Date:   Thu, 22 Apr 2021 13:17:37 -0700
From:   Tim Chen <tim.c.chen@...ux.intel.com>
To:     Yu Zhao <yuzhao@...gle.com>,
        Xing Zhengjun <zhengjun.xing@...ux.intel.com>
Cc:     akpm@...ux-foundation.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, ying.huang@...el.com,
        Shakeel Butt <shakeelb@...gle.com>,
        Michal Hocko <mhocko@...e.com>, wfg@...l.ustc.edu.cn
Subject: Re: [RFC] mm/vmscan.c: avoid possible long latency caused by
 too_many_isolated()



On 4/22/21 10:13 AM, Yu Zhao wrote:

> @@ -3302,6 +3252,7 @@ static bool throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
>  unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
>  				gfp_t gfp_mask, nodemask_t *nodemask)
>  {
> +	int nr_cpus;
>  	unsigned long nr_reclaimed;
>  	struct scan_control sc = {
>  		.nr_to_reclaim = SWAP_CLUSTER_MAX,
> @@ -3334,8 +3285,17 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
>  	set_task_reclaim_state(current, &sc.reclaim_state);
>  	trace_mm_vmscan_direct_reclaim_begin(order, sc.gfp_mask);
>  
> +	nr_cpus = current_is_kswapd() ? 0 : num_online_cpus();
> +	while (nr_cpus && !atomic_add_unless(&pgdat->nr_reclaimers, 1, nr_cpus)) {
> +		if (schedule_timeout_killable(HZ / 10))

100 msec seems like a long time to wait.  The original code in shrink_inactive_list
choose 100 msec sleep because the sleep happens only once in the while loop and 100 msec was
used to check for stalling.  In this case the loop can go on for a while and the 
#reclaimers can go down below the sooner than 100 msec. Seems like it should be checked
more often.

Tim

> +			return SWAP_CLUSTER_MAX;
> +	}
> +
>  	nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
>  
> +	if (nr_cpus)
> +		atomic_dec(&pgdat->nr_reclaimers);
> +
>  	trace_mm_vmscan_direct_reclaim_end(nr_reclaimed);
>  	set_task_reclaim_state(current, NULL);
>