lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 21 May 2011 09:34:47 -0400
From:	Andrew Lutomirski <luto@....edu>
To:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
Cc:	Minchan Kim <minchan.kim@...il.com>,
	Andrea Arcangeli <aarcange@...hat.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	fengguang.wu@...el.com, andi@...stfloor.org, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, mgorman@...e.de, hannes@...xchg.org,
	riel@...hat.com
Subject: Re: Kernel falls apart under light memory pressure (i.e. linking vmlinux)

On Sat, May 21, 2011 at 8:04 AM, KOSAKI Motohiro
<kosaki.motohiro@...fujitsu.com> wrote:
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 3f44b81..d1dabc9 100644
>> @@ -1426,8 +1437,13 @@ shrink_inactive_list(unsigned long nr_to_scan,
>> struct zone *zone,
>>
>>        /* Check if we should syncronously wait for writeback */
>>        if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
>> +               unsigned long nr_active, old_nr_scanned;
>>                set_reclaim_mode(priority, sc, true);
>> +               nr_active = clear_active_flags(&page_list, NULL);
>> +               count_vm_events(PGDEACTIVATE, nr_active);
>> +               old_nr_scanned = sc->nr_scanned;
>>                nr_reclaimed += shrink_page_list(&page_list, zone, sc);
>> +               sc->nr_scanned = old_nr_scanned;
>>        }
>>
>>        local_irq_disable();
>>
>> I just tested 2.6.38.6 with the attached patch.  It survived dirty_ram
>> and test_mempressure without any problems other than slowness, but
>> when I hit ctrl-c to stop test_mempressure, I got the attached oom.
>
> Minchan,
>
> I'm confused now.
> If pages got SetPageActive(), should_reclaim_stall() should never return true.
> Can you please explain which bad scenario was happen?
>
> -----------------------------------------------------------------------------------------------------
> static void reset_reclaim_mode(struct scan_control *sc)
> {
>        sc->reclaim_mode = RECLAIM_MODE_SINGLE | RECLAIM_MODE_ASYNC;
> }
>
> shrink_page_list()
> {
>  (snip)
>  activate_locked:
>                SetPageActive(page);
>                pgactivate++;
>                unlock_page(page);
>                reset_reclaim_mode(sc);                  /// here
>                list_add(&page->lru, &ret_pages);
>        }
> -----------------------------------------------------------------------------------------------------
>
>
> -----------------------------------------------------------------------------------------------------
> bool should_reclaim_stall()
> {
>  (snip)
>
>        /* Only stall on lumpy reclaim */
>        if (sc->reclaim_mode & RECLAIM_MODE_SINGLE)   /// and here
>                return false;
> -----------------------------------------------------------------------------------------------------
>

I did some tracing and the oops happens from the second call to
shrink_page_list after should_reclaim_stall returns true and it hits
the same pages in the same order that the earlier call just finished
calling SetPageActive on.  I have *not* confirmed that the two calls
happened from the same call to shrink_inactive_list, but something's
certainly wrong in there.

This is very easy to reproduce on my laptop.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ