linux-kernel - Re: [RFC] rlimit exceed notification events

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <xuny1t1dw49j.fsf@redhat.com>
Date:   Thu, 25 Aug 2016 13:07:36 +0300
From:   Yauheni Kaliuta <yauheni.kaliuta@...hat.com>
To:     Jiri Olsa <jolsa@...hat.com>
Cc:     linux-kernel@...r.kernel.org, Aristeu Rozanski <aris@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [RFC] rlimit exceed notification events

Hi, Jiri!

>>>>> On Wed, 24 Aug 2016 13:24:28 +0200, Jiri Olsa  wrote:

 > On Fri, Aug 19, 2016 at 05:41:20PM +0300, Yauheni Kaliuta wrote:
 >> 
 >> At the moment there is no clear indication if a process exceeds resource
 >> limit. In some cases the problematic syscall can return a error, in some cases
 >> the process can be just killed.

[...]

 >> 2) Using tracepoints. I've used a simple program, which dup()s until gets the
 >> error 3 times:

 > just to start up the discussion.. ;-)

 > I'd think this one (2) is the proper way,

>From the options I checked, I like it most as well. Probably I should
prepare an RFC PATCH with it.

 > but generaly you need to
 > come with good justification/usecase to add new tracepoint

 > also rlimit seems to be difficult to add tracepoints to,
 > because the checks are spread all over the code.. 

 > can't think of a good solution ATM

Yes, every place should be instrumented. I just introduce some indirection
to have some flexibility for the final output.

Still it's good to know if there are objections for such a
instrumentation in any of the resource check places, like file operations
for example.

 >> $ sudo ./perf record -e rlimit:rlimit_exceeded ./a.out

[...]

 >> index 6b1acdfe59da..a358de041ac4 100644
 >> --- a/fs/file.c
 >> +++ b/fs/file.c
 >> @@ -947,6 +947,9 @@ SYSCALL_DEFINE1(dup, unsigned int, fildes)
 >> else
 >> fput(file);
 >> }
 >> +	if (ret == -EMFILE)
 >> +		rlimit_exceeded(RLIMIT_NOFILE,
 >> +				rlimit(RLIMIT_NOFILE), (u64)-1);
 >> return ret;

 > how about other places? alloc_fd/get_unused_fd_flags/replace_fd..

This is very good question. Initially I just wanted something for demo, but
I run into a dilemma even here. Ideally it must be a place, which is

a) aware of RLIMIT and
b) responsible for the decision making:

1) It would be good to place it into __alloc_fd() since it is a final point
and performs the check to against the limit, but it's not aware of the
RLIMIT, the limit is passed to it from upper levels.

2) get_unused_fd_flags() is aware of RLIMIT and entry point for many other
fd allocations, but doesn't do any decision.

3) the dup() syscall is not aware of RLIMIT, but makes the final decision.

That was the reason, why I put it here for the prototype code, but it
doesn't look as a good place for final solution.

In many other cases both a) and b) are in one place, so there is no such
problem.

-- 
WBR,
Yauheni Kaliuta