linux-kernel - Re: [PATCH 2/3] mm: throttle on IO only when there are too many dirty and writeback pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160317120127.GC26017@dhcp22.suse.cz>
Date:	Thu, 17 Mar 2016 13:01:27 +0100
From:	Michal Hocko <mhocko@...nel.org>
To:	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
Cc:	akpm@...ux-foundation.org, torvalds@...ux-foundation.org,
	hannes@...xchg.org, mgorman@...e.de, rientjes@...gle.com,
	hillf.zj@...baba-inc.com, kamezawa.hiroyu@...fujitsu.com,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/3] mm: throttle on IO only when there are too many
 dirty and writeback pages

On Thu 17-03-16 20:35:23, Tetsuo Handa wrote:
[...]
> But what I felt strange is what should_reclaim_retry() is doing.
> 
> Michal Hocko wrote:
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index f77e283fb8c6..b2de8c8761ad 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -3044,8 +3045,37 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order,
> >  		 */
> >  		if (__zone_watermark_ok(zone, order, min_wmark_pages(zone),
> >  				ac->high_zoneidx, alloc_flags, available)) {
> > -			/* Wait for some write requests to complete then retry */
> > -			wait_iff_congested(zone, BLK_RW_ASYNC, HZ/50);
> > +			unsigned long writeback;
> > +			unsigned long dirty;
> > +
> > +			writeback = zone_page_state_snapshot(zone, NR_WRITEBACK);
> > +			dirty = zone_page_state_snapshot(zone, NR_FILE_DIRTY);
> > +
> > +			/*
> > +			 * If we didn't make any progress and have a lot of
> > +			 * dirty + writeback pages then we should wait for
> > +			 * an IO to complete to slow down the reclaim and
> > +			 * prevent from pre mature OOM
> > +			 */
> > +			if (!did_some_progress && 2*(writeback + dirty) > reclaimable) {
> > +				congestion_wait(BLK_RW_ASYNC, HZ/10);
> > +				return true;
> > +			}
> 
> writeback and dirty are used only when did_some_progress == 0. Thus, we don't
> need to calculate writeback and dirty using zone_page_state_snapshot() unless
> did_some_progress == 0.

OK, I will move this into if !did_some_progress.

> But, does it make sense to take writeback and dirty into account when
> disk_events_workfn (trace shown above) is doing GFP_NOIO allocation and
> wb_workfn (trace shown above) is doing (presumably) GFP_NOFS allocation?
> Shouldn't we use different threshold for GFP_NOIO / GFP_NOFS / GFP_KERNEL?

I have considered skiping the throttling part for GFP_NOFS/GFP_NOIO
previously but I couldn't have convinced myself it would make any
difference. We know there was no progress in the reclaim and even if the
current context is doing FS/IO allocation potentially then it obviously
cannot get its memory so it cannot proceed. So now we are in the state
where we either busy loop or sleep for a while. So I ended up not
complicating the code even more. If you have a use case where busy
waiting makes a difference then I would vote for a separate patch with a
clear description.

> > +
> > +			/*
> > +			 * Memory allocation/reclaim might be called from a WQ
> > +			 * context and the current implementation of the WQ
> > +			 * concurrency control doesn't recognize that
> > +			 * a particular WQ is congested if the worker thread is
> > +			 * looping without ever sleeping. Therefore we have to
> > +			 * do a short sleep here rather than calling
> > +			 * cond_resched().
> > +			 */
> > +			if (current->flags & PF_WQ_WORKER)
> > +				schedule_timeout(1);
> 
> This schedule_timeout(1) does not sleep. You lost the fix as of next-20160317.
> Please update.

Yeah, I have that updated in my local patch already.

Thanks!
-- 
Michal Hocko
SUSE Labs