[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <54606F02.5070808@suse.cz>
Date: Mon, 10 Nov 2014 08:53:38 +0100
From: Vlastimil Babka <vbabka@...e.cz>
To: Joonsoo Kim <iamjoonsoo.kim@....com>
CC: "P. Christeas" <xrg@...ux.gr>, linux-mm@...ck.org,
lkml <linux-kernel@...r.kernel.org>,
David Rientjes <rientjes@...gle.com>,
Norbert Preining <preining@...ic.at>,
Markus Trippelsdorf <markus@...ppelsdorf.de>,
Pavel Machek <pavel@....cz>
Subject: Re: Early test: hangs in mm/compact.c w. Linus's 12d7aacab56e9ef185c
On 11/10/2014 07:07 AM, Joonsoo Kim wrote:
> On Sat, Nov 08, 2014 at 11:18:37PM +0100, Vlastimil Babka wrote:
>> On 11/08/2014 02:11 PM, P. Christeas wrote:
>>
>> Hi,
>>
>> I think I finally found the cause by staring into the code... CCing
>> people from all 4 separate threads I know about this issue.
>> The problem with finding the cause was that the first report I got from
>> Markus was about isolate_freepages_block() overhead, and later Norbert
>> reported that reverting a patch for isolate_freepages* helped. But the
>> problem seems to be that although the loop in isolate_migratepages exits
>> because the scanners almost meet (they are within same pageblock), they
>> don't truly meet, therefore compact_finished() decides to continue, but
>> isolate_migratepages() exits immediately... boom! But indeed e14c720efdd7
>> made this situation possible, as free scaner pfn can now point to a
>> middle of pageblock.
>
> Indeed.
>
>>
>> So I hope the attached patch will fix the soft-lockup issues in
>> compact_zone. Please apply on 3.18-rc3 or later without any other reverts,
>> and test. It probably won't help Markus and his isolate_freepages_block()
>> overhead though...
>
> Yes, I found this bug too, but, it can't explain
> isolate_freepages_block() overhead. Anyway, I can't find another bug
> related to isolate_freepages_block(). :/
Thanks for checking.
>> Thanks,
>> Vlastimil
>>
>> ------8<------
>> >From fbf8eb0bcd2897090312e23da6a31bad9cc6b337 Mon Sep 17 00:00:00 2001
>> From: Vlastimil Babka <vbabka@...e.cz>
>> Date: Sat, 8 Nov 2014 22:20:43 +0100
>> Subject: [PATCH] mm, compaction: prevent endless loop in migrate scanner
>>
>> ---
>> mm/compaction.c | 8 ++++++--
>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index ec74cf0..1b7a1be 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -1029,8 +1029,12 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>> }
>>
>> acct_isolated(zone, cc);
>> - /* Record where migration scanner will be restarted */
>> - cc->migrate_pfn = low_pfn;
>> + /*
>> + * Record where migration scanner will be restarted. If we end up in
>> + * the same pageblock as the free scanner, make the scanners fully
>> + * meet so that compact_finished() terminates compaction.
>> + */
>> + cc->migrate_pfn = (end_pfn <= cc->free_pfn) ? low_pfn : cc->free_pfn;
>>
>> return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
>> }
>
> IMHO, proper fix is not to change this logic, but, to change decision
> logic in compact_finished() and in compact_zone(). Maybe helper
> function would be good for readability.
OK but I would think that to fix 3.18 ASAP and not introduce more
regressions, go with the patch above first, as it is the minimal fix and
people already test it. Then we can implement your suggestion later as a
cleanup for 3.19?
Vlastimil
> Thanks.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists