[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130619132614.GC16457@dhcp22.suse.cz>
Date: Wed, 19 Jun 2013 15:26:14 +0200
From: Michal Hocko <mhocko@...e.cz>
To: azurIt <azurit@...ox.sk>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
cgroups mailinglist <cgroups@...r.kernel.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
Johannes Weiner <hannes@...xchg.org>
Subject: Re: [PATCH for 3.2] memcg: do not trap chargers with full callstack
on OOM
On Mon 17-06-13 12:21:34, azurIt wrote:
> >Here we go. I hope I didn't screw anything (Johannes might double check)
> >because there were quite some changes in the area since 3.2. Nothing
> >earth shattering though. Please note that I have only compile tested
> >this. Also make sure you remove the previous patches you have from me.
>
>
> Hi Michal,
>
> it, unfortunately, didn't work. Everything was working fine but
> original problem is still occuring.
This would be more than surprising because tasks blocked at memcg OOM
don't hold any locks anymore. Maybe I have messed something up during
backport but I cannot spot anything.
> I'm unable to send you stacks or more info because problem is taking
> down the whole server for some time now (don't know what exactly
> caused it to start happening, maybe newer versions of 3.2.x).
So you are not testing with the same kernel with just the old patch
replaced by the new one?
> But i'm sure of one thing - when problem occurs, nothing is able to
> access hard drives (every process which tries it is freezed until
> problem is resolved or server is rebooted).
I would be really interesting to see what those tasks are blocked on.
> Problem is fixed after killing processes from cgroup which
> caused it and everything immediatelly starts to work normally. I
> find this out by keeping terminal opened from another server to one
> where my problem is occuring quite often and running several apps
> there (htop, iotop, etc.). When problem occurs, all apps which wasn't
> working with HDD was ok. The htop proved to be very usefull here
> because it's only reading proc filesystem and is also able to send
> KILL signals - i was able to resolve the problem with it
> without rebooting the server.
sysrq+t will give you the list of all tasks and their traces.
> I created a special daemon (about month ago) which is able to detect
> and fix the problem so i'm not having server outages now. The point
> was to NOT access anything which is stored on HDDs, the daemon is
> only reading info from cgroup filesystem and sending KILL signals to
> processes. Maybe i should be able to also read stack files before
> killing, i will try it.
>
> Btw, which vanilla kernel includes this patch?
None yet. But I hope it will be merged to 3.11 and backported to the
stable trees.
> Thank you and everyone involved very much for time and help.
>
> azur
--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists