linux-kernel - Re: [PATCH] perf record: Delete file if a failure occurs writing the perf data file

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <52824064.4060100@gmail.com>
Date:	Tue, 12 Nov 2013 07:51:16 -0700
From:	David Ahern <dsahern@...il.com>
To:	Ingo Molnar <mingo@...nel.org>
CC:	acme@...stprotocols.net, linux-kernel@...r.kernel.org,
	Frederic Weisbecker <fweisbec@...il.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Jiri Olsa <jolsa@...hat.com>,
	Namhyung Kim <namhyung@...nel.org>,
	Mike Galbraith <efault@....de>,
	Stephane Eranian <eranian@...gle.com>
Subject: Re: [PATCH] perf record: Delete file if a failure occurs writing
 the perf data file

Ingo:

On 11/11/13, 7:43 AM, David Ahern wrote:
> On 11/11/13, 2:37 AM, Ingo Molnar wrote:
>>
>> * David Ahern <dsahern@...il.com> wrote:
>>
>>> If perf fails to write data to the data file (e.g., ENOSPC error) it
>>> fails
>>> with the message:
>>>    failed to write perf data, error: No space left on device
>>>
>>> and stops — killing the workload too. The file is an unknown state.
>>> Trying to read it (e.g., perf report) fails with a SIGBUS error.
>>
>> Ouch - guys please first investiage that SIGBUS, we should not behave
>> unexpectedly on _any_ (read: random) perf.data file contents. The SIGBUS
>> likely suggests that the parsing isn't robust enough.

If you agree with the below summary then any further objections to 
deleting the file on write failure?

David

>
> I think we know why the SIGBUS is happening. From 'man mmap':
>
>
>  From man mmap:
>         SIGBUS Attempted access to a portion of the buffer that
>         does not correspond  to  the  file (for  example, beyond
>         the end of the file, ...
>
>
> With regards to perf-record, on a write() failure the header is not
> updated. From a recent change we try to proceed even though the data
> size is 0 - parsing the events we can. We finally hit upon an event that
> is only partially in the file (eg., header, but no data for event).
> Trying to read the event data leads to the SIGBUS:
>
> Running perf-report in gdb:
>
> WARNING: The /tmp/mnt/perf.data file's data size field is 0 which is
> unexpected.
> Was the 'perf record' command properly terminated?
>
>
> Program received signal SIGBUS, Bus error.
> perf_evsel__parse_sample (evsel=0x94eec0, event=0x7ffff7ed9d80,
> data=0x7fffffffd260)
>      at util/evsel.c:1242
> 1242        u16 max_size = event->header.size;
> (gdb) bt
> #0  perf_evsel__parse_sample (evsel=0x94eec0, event=0x7ffff7ed9d80,
> data=0x7fffffffd260)
>      at util/evsel.c:1242
> #1  0x000000000047c9ce in flush_sample_queue (s=0x94e2b0,
> tool=0x7fffffffde80)
>      at util/session.c:542
> #2  0x000000000047e2d4 in __perf_session__process_events (session=0x94e2b0,
>      data_offset=<optimized out>, data_size=<optimized out>,
> file_size=1048576, tool=0x7fffffffde80)
>      at util/session.c:1388
> #3  0x000000000042993c in __cmd_report (rep=0x7fffffffde80) at
> builtin-report.c:509
> #4  cmd_report (argc=0, argv=0x7fffffffe370, prefix=<optimized out>) at
> builtin-report.c:967
> #5  0x000000000041b063 in run_builtin (p=0x7cdf28, argc=4,
> argv=0x7fffffffe370) at perf.c:319
> #6  0x000000000041a8e3 in handle_internal_command (argv=0x7fffffffe370,
> argc=4) at perf.c:376
> #7  run_argv (argv=0x7fffffffe180, argcp=0x7fffffffe18c) at perf.c:420
> #8  main (argc=4, argv=0x7fffffffe370) at perf.c:521
>
>>
>>> Fix by deleting the file on a failure.
>>
>> That only works around the issue - if the same data file is produced by
>> some other method (or maliciously) then perf report will still SIGBUS ...
>
> We could handle SIGBUS in the analysis commands too. See the suggestion
> I had for handling the output failure using the mmap output option which
> uses lngjmp.
>
> David

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/