[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <52824064.4060100@gmail.com>
Date: Tue, 12 Nov 2013 07:51:16 -0700
From: David Ahern <dsahern@...il.com>
To: Ingo Molnar <mingo@...nel.org>
CC: acme@...stprotocols.net, linux-kernel@...r.kernel.org,
Frederic Weisbecker <fweisbec@...il.com>,
Peter Zijlstra <peterz@...radead.org>,
Jiri Olsa <jolsa@...hat.com>,
Namhyung Kim <namhyung@...nel.org>,
Mike Galbraith <efault@....de>,
Stephane Eranian <eranian@...gle.com>
Subject: Re: [PATCH] perf record: Delete file if a failure occurs writing
the perf data file
Ingo:
On 11/11/13, 7:43 AM, David Ahern wrote:
> On 11/11/13, 2:37 AM, Ingo Molnar wrote:
>>
>> * David Ahern <dsahern@...il.com> wrote:
>>
>>> If perf fails to write data to the data file (e.g., ENOSPC error) it
>>> fails
>>> with the message:
>>> failed to write perf data, error: No space left on device
>>>
>>> and stops — killing the workload too. The file is an unknown state.
>>> Trying to read it (e.g., perf report) fails with a SIGBUS error.
>>
>> Ouch - guys please first investiage that SIGBUS, we should not behave
>> unexpectedly on _any_ (read: random) perf.data file contents. The SIGBUS
>> likely suggests that the parsing isn't robust enough.
If you agree with the below summary then any further objections to
deleting the file on write failure?
David
>
> I think we know why the SIGBUS is happening. From 'man mmap':
>
>
> From man mmap:
> SIGBUS Attempted access to a portion of the buffer that
> does not correspond to the file (for example, beyond
> the end of the file, ...
>
>
> With regards to perf-record, on a write() failure the header is not
> updated. From a recent change we try to proceed even though the data
> size is 0 - parsing the events we can. We finally hit upon an event that
> is only partially in the file (eg., header, but no data for event).
> Trying to read the event data leads to the SIGBUS:
>
> Running perf-report in gdb:
>
> WARNING: The /tmp/mnt/perf.data file's data size field is 0 which is
> unexpected.
> Was the 'perf record' command properly terminated?
>
>
> Program received signal SIGBUS, Bus error.
> perf_evsel__parse_sample (evsel=0x94eec0, event=0x7ffff7ed9d80,
> data=0x7fffffffd260)
> at util/evsel.c:1242
> 1242 u16 max_size = event->header.size;
> (gdb) bt
> #0 perf_evsel__parse_sample (evsel=0x94eec0, event=0x7ffff7ed9d80,
> data=0x7fffffffd260)
> at util/evsel.c:1242
> #1 0x000000000047c9ce in flush_sample_queue (s=0x94e2b0,
> tool=0x7fffffffde80)
> at util/session.c:542
> #2 0x000000000047e2d4 in __perf_session__process_events (session=0x94e2b0,
> data_offset=<optimized out>, data_size=<optimized out>,
> file_size=1048576, tool=0x7fffffffde80)
> at util/session.c:1388
> #3 0x000000000042993c in __cmd_report (rep=0x7fffffffde80) at
> builtin-report.c:509
> #4 cmd_report (argc=0, argv=0x7fffffffe370, prefix=<optimized out>) at
> builtin-report.c:967
> #5 0x000000000041b063 in run_builtin (p=0x7cdf28, argc=4,
> argv=0x7fffffffe370) at perf.c:319
> #6 0x000000000041a8e3 in handle_internal_command (argv=0x7fffffffe370,
> argc=4) at perf.c:376
> #7 run_argv (argv=0x7fffffffe180, argcp=0x7fffffffe18c) at perf.c:420
> #8 main (argc=4, argv=0x7fffffffe370) at perf.c:521
>
>>
>>> Fix by deleting the file on a failure.
>>
>> That only works around the issue - if the same data file is produced by
>> some other method (or maliciously) then perf report will still SIGBUS ...
>
> We could handle SIGBUS in the analysis commands too. See the suggestion
> I had for handling the output failure using the mmap output option which
> uses lngjmp.
>
> David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists