linux-kernel - Re: Kernel falls apart under light memory pressure (i.e. linking vmlinux)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <BANLkTikQd34QZnQVSn_9f_Mxc8wtJMHY0w@mail.gmail.com>
Date:	Mon, 23 May 2011 08:12:50 +0900
From:	Minchan Kim <minchan.kim@...il.com>
To:	Andrew Lutomirski <luto@....edu>
Cc:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Andrea Arcangeli <aarcange@...hat.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	fengguang.wu@...el.com, andi@...stfloor.org, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, mgorman@...e.de, hannes@...xchg.org,
	riel@...hat.com
Subject: Re: Kernel falls apart under light memory pressure (i.e. linking vmlinux)

On Sun, May 22, 2011 at 9:22 PM, Andrew Lutomirski <luto@....edu> wrote:
> On Sat, May 21, 2011 at 10:44 AM, Minchan Kim <minchan.kim@...il.com> wrote:
>> I would like to confirm this problem.
>> Could you show the diff of 2.6.38.6 with current your 2.6.38.6 + alpha?
>> (ie, I would like to know that what patches you add up on vanilla
>> 2.6.38.6 to reproduce this problem)
>> I believe you added my crap below patch. Right?
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 292582c..69d317e 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -311,7 +311,8 @@ static void set_reclaim_mode(int priority, struct
>> scan_control *sc,
>>        */
>>       if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
>>               sc->reclaim_mode |= syncmode;
>> -       else if (sc->order && priority < DEF_PRIORITY - 2)
>> +       else if ((sc->order && priority < DEF_PRIORITY - 2) ||
>> +                               prioiry <= DEF_PRIORITY / 3)
>>               sc->reclaim_mode |= syncmode;
>>       else
>>               sc->reclaim_mode = RECLAIM_MODE_SINGLE | RECLAIM_MODE_ASYNC;
>> @@ -1349,10 +1350,6 @@ static inline bool
>> should_reclaim_stall(unsigned long nr_taken,
>>       if (current_is_kswapd())
>>               return false;
>>
>> -       /* Only stall on lumpy reclaim */
>> -       if (sc->reclaim_mode & RECLAIM_MODE_SINGLE)
>> -               return false;
>> -
>
> Bah.  It's this last hunk.  Without this I can't reproduce the oops.
> With this hunk, the reset_reclaim_mode doesn't work and
> shrink_page_list is incorrectly called twice.

OMG! I should have said more clearly to you.  Above my patch is totally _crap_.
I thought you have experimented test without above crap patch. :(
Sorry for consuming time of many mm guys.
My apologies.

I want to resolve your original problem(ie, hang) before digging the
OOM problem.

>
> So we're back to the original problem...

Could you test below patch based on vanilla 2.6.38.6?
The expect result is that system hang never should happen.
I hope this is last test about hang.

Thanks.

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 292582c..1663d24 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -231,8 +231,11 @@ unsigned long shrink_slab(struct shrink_control *shrink,
       if (scanned == 0)
               scanned = SWAP_CLUSTER_MAX;

-       if (!down_read_trylock(&shrinker_rwsem))
-               return 1;       /* Assume we'll be able to shrink next time */
+       if (!down_read_trylock(&shrinker_rwsem)) {
+               /* Assume we'll be able to shrink next time */
+               ret = 1;
+               goto out;
+       }

       list_for_each_entry(shrinker, &shrinker_list, list) {
               unsigned long long delta;
@@ -286,6 +289,8 @@ unsigned long shrink_slab(struct shrink_control *shrink,
               shrinker->nr += total_scan;
       }
       up_read(&shrinker_rwsem);
+out:
+       cond_resched();
       return ret;
 }

@@ -2331,7 +2336,7 @@ static bool sleeping_prematurely(pg_data_t
*pgdat, int order, long remaining,
        * must be balanced
        */
       if (order)
-               return pgdat_balanced(pgdat, balanced, classzone_idx);
+               return !pgdat_balanced(pgdat, balanced, classzone_idx);
       else
               return !all_zones_ok;
 }

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/