[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160824112428.GA15743@krava>
Date: Wed, 24 Aug 2016 13:24:28 +0200
From: Jiri Olsa <jolsa@...hat.com>
To: Yauheni Kaliuta <yauheni.kaliuta@...hat.com>
Cc: linux-kernel@...r.kernel.org, Aristeu Rozanski <aris@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [RFC] rlimit exceed notification events
On Fri, Aug 19, 2016 at 05:41:20PM +0300, Yauheni Kaliuta wrote:
> Hi!
>
> At the moment there is no clear indication if a process exceeds resource
> limit. In some cases the problematic syscall can return a error, in some cases
> the process can be just killed.
>
> I'm trying to implement some sort of monitoring of such events and have a
> question, what way would be acceptable.
>
> 1) The straight forward solution would be to instrument every such a place with
> a printk (something related implemented, for example, by
> d977d56ce5b3e8842236f2f9e7483d4914c9592e).
>
> It has some concerns about reliablity and performance (giving a way to flood
> printk buffer because of bad application, for example).
>
> 2) Using tracepoints. I've used a simple program, which dup()s until gets the
> error 3 times:
just to start up the discussion.. ;-)
I'd think this one (2) is the proper way, but generaly you need to
come with good justification/usecase to add new tracepoint
also rlimit seems to be difficult to add tracepoints to,
because the checks are spread all over the code..
can't think of a good solution ATM
> $ sudo ./perf record -e rlimit:rlimit_exceeded ./a.out
> Couldn't dup file: Too many open files, iteration 1020
> Couldn't dup file: Too many open files, iteration 1021
> Couldn't dup file: Too many open files, iteration 1022
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.010 MB perf.data (3 samples) ]
>
> $ sudo ./perf report
> # To display the perf.data header info, please use --header/--header-only options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 3 of event 'rlimit:rlimit_exceeded'
> # Event count (approx.): 3
> #
> # Overhead Trace output
> # ........ ........................................................
> #
> 100.00% RLIMIT NOFILE violation. Current 1024, requested Unknown
>
> The code to demonstrate the idea below:
>
> diff --git a/fs/file.c b/fs/file.c
> index 6b1acdfe59da..a358de041ac4 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -947,6 +947,9 @@ SYSCALL_DEFINE1(dup, unsigned int, fildes)
> else
> fput(file);
> }
> + if (ret == -EMFILE)
> + rlimit_exceeded(RLIMIT_NOFILE,
> + rlimit(RLIMIT_NOFILE), (u64)-1);
> return ret;
how about other places? alloc_fd/get_unused_fd_flags/replace_fd..
jirka
Powered by blists - more mailing lists