linux-kernel - Re: counting file descriptors with a cgroup controller

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-id: <50e07c29-295a-62fd-e0ad-7e52d5b55c7d@samsung.com>
Date:   Wed, 08 Mar 2017 10:52:18 +0100
From:   Krzysztof Opasiak <k.opasiak@...sung.com>
To:     Tejun Heo <tj@...nel.org>
Cc:     lizefan@...wei.com, hannes@...xchg.org,
        Łukasz Stelmach <l.stelmach@...sung.com>,
        linux-kernel@...r.kernel.org,
        Karol Lewandowski <k.lewandowsk@...sung.com>,
        cgroups@...r.kernel.org
Subject: Re: counting file descriptors with a cgroup controller

On 03/07/2017 09:48 PM, Tejun Heo wrote:
> Hello,
>
> On Tue, Mar 07, 2017 at 09:06:49PM +0100, Krzysztof Opasiak wrote:
>> Personally, I don't want to use rlimit for this as it ends up returning
>> error code from for example open() when we hit the limit. This may lead to
>> some unpredictable crashes in  services (esp. those poor proprietary binary
>> blobs). Instead of injecting errors to service we would like to just get
>> notification that this service has more opened fds than it should and ask it
>> to restart in a polite way.
>>
>> For memory seems to be quite easy to achieve as we can just get eventfd
>> notification when application passes given memory usage using memory cgroup
>> controller. Maybe you know some efficient method to do the same for fds?
>
> So, if all you wanna do is reliably detecting open(2) failures, can't
> you do that with bpf tracing?
>

Well detecting failures of open is not enough and it has couple of problems:

1) open(2) is not the only syscall which creates fd. In addition to 
other syscalls like socket(2), dup(2), some ioctl() on drivers (for 
example video) also creates fds. I'm not sure if we have any other 
mechanism than grep through kernel source to find out which ioctl() 
creates fd or and which not.

2) As far as I know (I'm not a bpf specialist so please correct me if 
I'm wrong), with bpf we are able only to detect such events but we are 
unable to prevent them from getting to caller. It means that service 
will know that it run out of fds and will need to handle this properly. 
If there is a bug in this error path service may crash.
What we would like to get is just a notification to external process 
that some limit has been reached without returning error to service itself.

3) Theoretically we could do this using bpf or syscall auditing and 
count fds for each userspace process or check /proc/<PID> after each 
notification but it's getting very heavy for production environment.

Best regards,
-- 
Krzysztof Opasiak
Samsung R&D Institute Poland
Samsung Electronics