linux-kernel - Re: Deadlock possibly caused by too_many

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 19 Oct 2010 11:37:29 +0900
From:	Minchan Kim <minchan.kim@...il.com>
To:	Wu Fengguang <fengguang.wu@...el.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Neil Brown <neilb@...e.de>, Rik van Riel <riel@...hat.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"Li, Shaohua" <shaohua.li@...el.com>
Subject: Re: Deadlock possibly caused by too_many_isolated.

On Tue, Oct 19, 2010 at 11:24 AM, Wu Fengguang <fengguang.wu@...el.com> wrote:
> On Tue, Oct 19, 2010 at 06:41:37AM +0800, Andrew Morton wrote:
>> On Tue, 19 Oct 2010 09:31:42 +1100
>> Neil Brown <neilb@...e.de> wrote:
>>
>> > On Mon, 18 Oct 2010 14:58:59 -0700
>> > Andrew Morton <akpm@...ux-foundation.org> wrote:
>> >
>> > > On Tue, 19 Oct 2010 00:15:04 +0800
>> > > Wu Fengguang <fengguang.wu@...el.com> wrote:
>> > >
>> > > > Neil find that if too_many_isolated() returns true while performing
>> > > > direct reclaim we can end up waiting for other threads to complete their
>> > > > direct reclaim.  If those threads are allowed to enter the FS or IO to
>> > > > free memory, but this thread is not, then it is possible that those
>> > > > threads will be waiting on this thread and so we get a circular
>> > > > deadlock.
>> > > >
>> > > > some task enters direct reclaim with GFP_KERNEL
>> > > >   => too_many_isolated() false
>> > > >     => vmscan and run into dirty pages
>> > > >       => pageout()
>> > > >         => take some FS lock
>> > > >           => fs/block code does GFP_NOIO allocation
>> > > >             => enter direct reclaim again
>> > > >               => too_many_isolated() true
>> > > >                 => waiting for others to progress, however the other
>> > > >                    tasks may be circular waiting for the FS lock..
>>
>> I'm assuming that the last four "=>"'s here should have been indented
>> another stop.
>
> Yup. I'll fix it in next post.
>
>> > > > The fix is to let !__GFP_IO and !__GFP_FS direct reclaims enjoy higher
>> > > > priority than normal ones, by honouring them higher throttle threshold.
>> > > >
>> > > > Now !GFP_IOFS reclaims won't be waiting for GFP_IOFS reclaims to
>> > > > progress. They will be blocked only when there are too many concurrent
>> > > > !GFP_IOFS reclaims, however that's very unlikely because the IO-less
>> > > > direct reclaims is able to progress much more faster, and they won't
>> > > > deadlock each other. The threshold is raised high enough for them, so
>> > > > that there can be sufficient parallel progress of !GFP_IOFS reclaims.
>> > >
>> > > I'm not sure that this is really a full fix.  Torsten's analysis does
>> > > appear to point at the real bug: raid1 has code paths which allocate
>> > > more than a single element from a mempool without starting IO against
>> > > previous elements.
>> >
>> > ... point at "a" real bug.
>> >
>> > I think there are two bugs here.
>> > The raid1 bug that Torsten mentions is certainly real (and has been around
>> > for an embarrassingly long time).
>> > The bug that I identified in too_many_isolated is also a real bug and can be
>> > triggered without md/raid1 in the mix.
>> > So this is not a 'full fix' for every bug in the kernel :-),
>
>> > but it could well be a full fix for this particular bug.
>
> Yeah it aims to be a full fix for one bug.
>
>> Can we just delete the too_many_isolated() logic?  (Crappy comment
>
> If the two cond_resched() calls can be removed from
> shrink_page_list(), the major cause of too many pages being
> isolated will be gone. However the writeback-waiting logic after
> should_reclaim_stall() will also block the direct reclaimer for long
> time with pages isolated, which may bite under pathological conditions.
>
>> describes what the code does but not why it does it).
>
> Good point. The comment could be improved as follows.
>
> Thanks,
> Fengguang
>
> ---
> Subject: vmscan: comment too_many_isolated()
> From: Wu Fengguang <fengguang.wu@...el.com>
> Date: Tue Oct 19 09:53:23 CST 2010
>
> Comment "Why it's doing so" rather than "What it does"
> as proposed by Andrew Morton.
>
> Signed-off-by: Wu Fengguang <fengguang.wu@...el.com>
Reviewed-by: Minchan Kim <minchan.kim@...il.com>



-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/