linux-kernel - Re: [PATCH for 3.2.34] memcg: do not trigger OOM from add_to_page_cache

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 10 Dec 2012 10:43:38 +0100
From:	Michal Hocko <mhocko@...e.cz>
To:	azurIt <azurit@...ox.sk>
Cc:	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	cgroups mailinglist <cgroups@...r.kernel.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Johannes Weiner <hannes@...xchg.org>
Subject: Re: [PATCH for 3.2.34] memcg: do not trigger OOM from
 add_to_page_cache_locked

On Mon 10-12-12 02:20:38, azurIt wrote:
[...]
> Michal,

Hi,
 
> this was printing so many debug messages to console that the whole
> server hangs

Hmm, this is _really_ surprising. The latest patch didn't add any new
logging actually. It just enahanced messages which were already printed
out previously + changed few functions to be not inlined so they show up
in the traces. So the only explanation is that the workload has changed
or the patches got misapplied.

> and i had to hard reset it after several minutes :( Sorry
> but i cannot test such a things in production. There's no problem with
> one soft reset which takes 4 minutes but this hard reset creates about
> 20 minutes outage (mainly cos of disk quotas checking).

Understood.

> Last logged message:
> 
> Dec 10 02:03:29 server01 kernel: [  220.366486] grsec: From 141.105.120.152: bruteforce prevention initiated for the next 30 minutes or until service restarted, stalling each fork 30 seconds.  Please investigate the crash report for /usr/lib/apache2/mpm-itk/apache2[apache2:3586] uid/euid:1258/1258 gid/egid:100/100, parent /usr/lib/apache2/mpm-itk/apache2[apache2:2142] uid/euid:0/0 gid/egid:0/0

This explains why you have seen your machine hung. I am not familiar
with grsec but stalling each fork 30s sounds really bad.

Anyway this will not help me much. Do you happen to still have any of
those logged traces from the last run?

Apart from that. If my current understanding is correct then this is
related to transparent huge pages (and leaking charge to the page fault
handler). Do you see the same problem if you disable THP before you
start your workload? (echo never > /sys/kernel/mm/transparent_hugepage/enabled)
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/