linux-kernel - Re: counting file descriptors with a cgroup controller

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-id: <7fbd9c4c-76ca-4073-9afa-1ab54364ec79@samsung.com>
Date:   Tue, 07 Mar 2017 12:19:52 +0100
From:   Krzysztof Opasiak <k.opasiak@...sung.com>
To:     Tejun Heo <tj@...nel.org>
Cc:     lizefan@...wei.com, hannes@...xchg.org,
        Łukasz Stelmach <l.stelmach@...sung.com>,
        linux-kernel@...r.kernel.org,
        Karol Lewandowski <k.lewandowsk@...sung.com>,
        cgroups@...r.kernel.org
Subject: Re: counting file descriptors with a cgroup controller

Hi

On 03/06/2017 07:58 PM, Tejun Heo wrote:
> Hello,
>
> On Fri, Feb 17, 2017 at 12:37:11PM +0100, Krzysztof Opasiak wrote:
>>> We need to limit and monitor the number of file descriptors processes
>>> keep open. If a process exceeds certain limit we'd like to terminate it
>>> and restart it or reboot the whole system. Currently the RLIMIT API
>>> allows limiting the number of file descriptors but to achieve our goals
>>> we'd need to make sure all programmes we run handle EMFILE errno
>>> properly. That is why we consider developing a cgroup controller that
>>> limits the number of open file descriptors of its members (similar to
>>>  memory controler).
>>>
>>> Any comments? Is there any alternative that:
>>>
>>> + does not require modifications of user-land code,
>>> + enables other process (e.g. init) to be notified and apply policy.
>
> Hmm... I'm not quite sure fds qualify as an independent system-wide
> resource.  We did that for pids because pids are globally limited and
> can run out way earlier than memory backing it.  I don't think we have
> similar restructions for fds, do we?

Well I'm not aware of such restrictions...

So maybe let me clarify our use case so we can have some more discussion 
about this. We are dealing with task of monitoring system services on an 
IoT system. So this system needs to run as long as possible without 
reboot just like server. In server world almost whole system state is 
being monitored by services like nagios. They measure each parameter 
(like cpu, memory etc) with some interval. Unfortunately we cannot use 
this it in an embedded system due to power consumption.

So generally now we consider two approaches:

1) Use rlimits when possible to limit resources for each process.

The problem here is that this creates an implicit requirement that all 
system services are well written and able to detect that they for 
example run out of fd and they will just exit with a suitable error code 
instead of hanging forever and responding to clients that they are 
unable to handle their request due to lack of fd. This is hard specially 
when service use a lot of libraries under the hood because they also 
need to return this error code from each functions which opens some 
files. This is especially hard when using some proprietary services or 
libraries for we don't have access to source code.

2) Use cgroups to limit and monitor resources usage

Generally systemd creates a cgroup for each service. cgroups like memory 
cgroup has an ability to notify userspace when memory usage reaches some 
level. So for example systemd could get notification that one of cgroups 
is using more memory than it should but as long as it's not a hard limit 
of the cgroup this service is not going to even notice this. So instead 
of returning error from for example malloc() in service, systemd could 
just send signal to that service and ask it to exit gracefully and the 
restart it. The disadvantage of this solution is the need of having 
cgroup for each resource we would like to monitor. For now we have 
suitable cgroups for everything we need apart from file descriptors.

What do you think about this? Maybe you have some other ideas how we 
could achieve this?

Best regards,
-- 
Krzysztof Opasiak
Samsung R&D Institute Poland
Samsung Electronics