linux-kernel - Re: cgroup blkio bug/feedback

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20111013160041.GC25588@redhat.com>
Date:	Thu, 13 Oct 2011 12:00:41 -0400
From:	Vivek Goyal <vgoyal@...hat.com>
To:	"krzf83@...il.com " <krzf83@...il.com>
Cc:	linux-kernel@...r.kernel.org,
	Morton Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: cgroup blkio bug/feedback

[ Please don't top post. Respond inline ]

On Thu, Oct 13, 2011 at 05:37:33PM +0200, krzf83@...il.com  wrote:
> Rsync iops limiting thing was that I've tried limiting when rsync-ing
> from /dev/sdc (mounted as /ssd) to /home/ssd-copy (/home is /dev/md2).
> During that usage I've encountred overloads and system unresponsivness
> even greater than when not using limiting at all.

Ok, so you have your /home on md target and rsyncing from ssd to home and
hence trying to limit the impact of writes on /home by limiting write
rate on /home disk.

What's the file system you are using on /home ? I will try to do
something similar on local system and see if I can reproduce the
issue.

> 
> I've also tried to limit iops for every "normal" user (not deamon
> running users) in the system for /home (/dev/md2). I've writen script
> that initialy assings pids to cgroups and initializes cgrulesengd so
> spawned apllications in the future will be in proper croups. I've
> encountred system overloads (hard reboot required) every 5-20 hours.
> That is also when I specifilcy did not limit tasks that were spawned
> by webserver (which are fastcgi php tasks and some passenger tasks).

So if you just put processes in a blkio cgroup but not specify any
limits, load average is fine? It is only when you specify some limits
load average goes up?

I am still scratching my head that how does that happen. Is it that
some application is forking more processes if sufficient IO is not
making progress due to throttling or what.

> 
> Anyway as for my other tests with blkio memory limits
> (memory.limit_in_bytes)

A minor clarification. memory.limit_in_bytes is provided by memory controller
and not by blkio controller.

> I also got huge system overloads when tasks
> were killed. However this were probably due to websever spawning those
> again and again imideatly (mainly phusion passenger tasks). I've tried
> separating process-es that were spawned by webserver to other, not
> limited, cgroup, but as I recall (I've done it about 1,5 month ago)
> something were also causing overloads and constatant
> kill/respawn/kill/respawn in my production webserver.

Looks like you need to give more memory to this cgroup.

> 
> As for blkio blkio.weight this would be fine thing, however  it causes
> loadavg to spike like hell when limiting one process.

Are you using CFQ on your md raid component disks? What's the mdraid
configuraiton. Again, I might give it a shot here. Have not seen
anything like what you are explaining.

When this load average increases, can you capture "vmstat 2" output.
I am also curious to know who is forking off these extra processes
in the system. (may be some "ps" can help).

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/