linux-kernel - Re: Kernel falls apart under light memory pressure (i.e. linking vmlinux)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <BANLkTikZeDiNwh+hihEMWwGyh6+ZVMA=_A@mail.gmail.com>
Date:	Wed, 18 May 2011 22:41:01 -0400
From:	Andrew Lutomirski <luto@....edu>
To:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Cc:	Minchan Kim <minchan.kim@...il.com>,
	Wu Fengguang <fengguang.wu@...el.com>,
	Andi Kleen <andi@...stfloor.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	LKML <linux-kernel@...r.kernel.org>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Mel Gorman <mgorman@...e.de>,
	Johannes Weiner <hannes@...xchg.org>,
	Rik van Riel <riel@...hat.com>
Subject: Re: Kernel falls apart under light memory pressure (i.e. linking vmlinux)

On Wed, May 18, 2011 at 10:30 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@...fujitsu.com> wrote:
> On Wed, 18 May 2011 22:15:53 -0400
> Andrew Lutomirski <luto@....edu> wrote:
>
>> On Wed, May 18, 2011 at 1:17 AM, Minchan Kim <minchan.kim@...il.com> wrote:
>> > On Wed, May 18, 2011 at 4:22 AM, Andrew Lutomirski <luto@....edu> wrote:
>
>> > Andrew, Could you test this patch with !pgdat_balanced patch?
>> > I think we shouldn't see OOM message if we have lots of free swap space.
>> >
>> > == CUT_HERE ==
>> > diff --git a/mm/vmscan.c b/mm/vmscan.c
>> > index f73b865..cc23f04 100644
>> > --- a/mm/vmscan.c
>> > +++ b/mm/vmscan.c
>> > @@ -1341,10 +1341,6 @@ static inline bool
>> > should_reclaim_stall(unsigned long nr_taken,
>> >        if (current_is_kswapd())
>> >                return false;
>> >
>> > -       /* Only stall on lumpy reclaim */
>> > -       if (sc->reclaim_mode & RECLAIM_MODE_SINGLE)
>> > -               return false;
>> > -
>> >        /* If we have relaimed everything on the isolated list, no stall */
>> >        if (nr_freed == nr_taken)
>> >                return false;
>> >
>> >
>> >
>> > Then, if you don't see any unnecessary OOM but still see the hangup,
>> > could you apply this patch based on previous?
>>
>> With this patch, I started GNOME and Firefox, turned on swap, and ran
>> test_mempressure.sh 1500 1400 1.  Instant panic (or OOPS and hang or
>> something -- didn't get the top part).  Picture attached -- it looks
>> like memcg might be involved.  I'm running F15, so it might even be
>> doing something.
>>
>
> Hmm, what kernel version do you use ?
> I think memcg is not guilty because RIP is shrink_page_list().
> But ok, I'll dig this. Could you give us your .config ?

Attached.

The address in shrink_page_list is ud2, from (I think)
VM_BUG_ON(PageActive(page));.  The sequence is:

   0xffffffff810d24cc <+202>:	callq  0xffffffff810cf930 <test_and_set_bit>
   0xffffffff810d24d1 <+207>:	test   %eax,%eax
   0xffffffff810d24d3 <+209>:	jne    0xffffffff810d2aa5 <shrink_page_list+1699>
   0xffffffff810d24d9 <+215>:	mov    -0x28(%rbx),%rax
   0xffffffff810d24dd <+219>:	test   $0x40,%al
   0xffffffff810d24df <+221>:	je     0xffffffff810d24e3 <shrink_page_list+225>
   0xffffffff810d24e1 <+223>:	ud2


--Andy

Download attachment ".config" of type "application/octet-stream" (88497 bytes)