linux-kernel - Re: [PATCH] memcg, vmscan: Do not wait for writeback if killed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20151203090826.GD9264@dhcp22.suse.cz>
Date:	Thu, 3 Dec 2015 10:08:26 +0100
From:	Michal Hocko <mhocko@...nel.org>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Johannes Weiner <hannes@...xchg.org>,
	Vladimir Davydov <vdavydov@...allels.com>,
	Hugh Dickins <hughd@...gle.com>, linux-mm@...ck.org,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] memcg, vmscan: Do not wait for writeback if killed

On Wed 02-12-15 14:25:03, Andrew Morton wrote:
> On Wed,  2 Dec 2015 15:26:18 +0100 Michal Hocko <mhocko@...nel.org> wrote:
> 
> > From: Michal Hocko <mhocko@...e.com>
> > 
> > Legacy memcg reclaim waits for pages under writeback to prevent from a
> > premature oom killer invocation because there was no memcg dirty limit
> > throttling implemented back then.
> > 
> > This heuristic might complicate situation when the writeback cannot make
> > forward progress because of the global OOM situation. E.g. filesystem
> > backed by the loop device relies on the underlying filesystem hosting
> > the image to make forward progress which cannot be guaranteed and so
> > we might end up triggering OOM killer to resolve the situation. If the
> > oom victim happens to be the task stuck in wait_on_page_writeback in the
> > memcg reclaim then we are basically deadlocked.
> > 
> > Introduce wait_on_page_writeback_killable and use it in this path to
> > prevent from the issue. shrink_page_list will back off if the wait
> > was interrupted.
> > 
> > ...
> >
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1021,10 +1021,19 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> >  
> >  			/* Case 3 above */
> >  			} else {
> > +				int ret;
> > +
> >  				unlock_page(page);
> > -				wait_on_page_writeback(page);
> > +				ret = wait_on_page_writeback_killable(page);
> >  				/* then go back and try same page again */
> >  				list_add_tail(&page->lru, page_list);
> > +
> > +				/*
> > +				 * We've got killed while waiting here so
> > +				 * expedite our way out from the reclaim
> > +				 */
> > +				if (ret)
> > +					break;
> >  				continue;
> >  			}
> >  		}
> 
> This function is 350 lines long and it takes a bit of effort to work
> out what that `break' is breaking from and where it goes next.  I think
> you want a "goto keep_killed" here for consistency and sanity.

Yeah, sounds better. See an update below:

> Also, there's high risk here of a pending signal causing the code to
> fall into some busy loop where it repeatedly tries to do something but
> then bales out without doing it.  It's unobvious how this change avoids
> such things.  (Maybe it *does* avoid such things, but it should be
> obvious!).

shrink_page_list is called from __alloc_contig_migrate_range and
shrink_inactive_list. Both of them handle fatal_signal_pending and bail
out. I was relying on this behavior. I realize this is far from optimal
wrt. readability but I do not have a great idea how to improve it
without sticking more fatal_signal_pending checks into the reclaim path.

So you think a comment would be sufficient?
---
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 98a1934493af..2e8ee9e5fcb5 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1031,9 +1031,12 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				/*
 				 * We've got killed while waiting here so
 				 * expedite our way out from the reclaim
+				 *
+				 * Our callers should make sure we do not
+				 * get here with fatal signals again.
 				 */
 				if (ret)
-					break;
+					goto keep_killed;
 				continue;
 			}
 		}
@@ -1227,6 +1230,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		VM_BUG_ON_PAGE(PageLRU(page) || PageUnevictable(page), page);
 	}
 
+keep_killed:
 	mem_cgroup_uncharge_list(&free_pages);
 	try_to_unmap_flush();
 	free_hot_cold_page_list(&free_pages, true);
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/