[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c5fbed80-5933-eca3-001e-0e2aaccfcd1d@amazon.com>
Date: Fri, 21 Jun 2019 15:28:34 +0200
From: Alexander Graf <graf@...zon.com>
To: <samcacc@...zon.com>, Sam Caccavale <samcacc@...zon.de>
CC: <samcaccavale@...il.com>, <nmanthey@...zon.de>,
<wipawel@...zon.de>, <dwmw@...zon.co.uk>, <mpohlack@...zon.de>,
<graf@...zon.de>, <karahmed@...zon.de>,
<andrew.cooper3@...rix.com>, <JBeulich@...e.com>,
<pbonzini@...hat.com>, <rkrcmar@...hat.com>, <tglx@...utronix.de>,
<mingo@...hat.com>, <bp@...en8.de>, <hpa@...or.com>,
<paullangton4@...il.com>, <anirudhkaushik@...gle.com>,
<x86@...nel.org>, <kvm@...r.kernel.org>,
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/3] Emulate simple x86 instructions in userspace
On 12.06.19 17:19, samcacc@...zon.com wrote:
> On 5/31/19 10:38 AM, Alexander Graf wrote:
>> On 21.05.19 17:39, Sam Caccavale wrote:
>>
>>> +static void dump_state_after(const char *desc, struct state *state)
>>> +{
>>> + debug(" -- State after %s --\n", desc);
>>> + debug("mode: %s\n", x86emul_mode_string[state->ctxt.mode]);
>>> + debug(" cr0: %lx\n", state->vcpu.cr[0]);
>>> + debug(" cr3: %lx\n", state->vcpu.cr[3]);
>>> + debug(" cr4: %lx\n", state->vcpu.cr[4]);
>>> +
>>> + debug("Decode _eip: %lu\n", state->ctxt._eip);
>>> + debug("Emulate eip: %lu\n", state->ctxt.eip);
>>> +
>>> + debug("\n");
>>> }
>>> int step_emulator(struct state *state)
>>> {
>>> - return 0;
>>> + int rc, prev_eip = state->ctxt.eip;
>>> + int decode_size = state->data_available - decode_offset;
>>> +
>>> + if (decode_size < 15) {
>>> + rc = x86_decode_insn(&state->ctxt, &state->data[decode_offset],
>>> + decode_size);
>>> + } else {
>>> + rc = x86_decode_insn(&state->ctxt, NULL, 0);
>>
>> Isn't this going to fetch instructions from data as well? Why do we need
>> the < 15 special case at all?
>>
> I've changed the method of acquiring data in v2, but the 15 limit is
> still relevant. If x86_decode_insn is called with a NULL pointer and
> instruction size 0, the bytes are fetched via the emulator_ops.fetch
> function. This would be nice, but there is no way of limiting how many
> bytes it will try and fetch-- and it usually grabs 15 since that is the
> longest x86 instruction (as of yet?). When there are less than 15 bytes
> left, limiting the fetch size to the remaining bytes is important.
You want to at least add a comment here, detailing the fact that where
the magic 15 comes from and that you want to exercise the normal
prefetch path while still allowing the buffer to shrink < 15 bytes :).
Maybe move MAX_INST_SIZE from svm.c into a .h file and reuse that while
at it.
[...]
>>> diff --git a/tools/fuzz/x86_instruction_emulation/scripts/bin_fuzz
>>> b/tools/fuzz/x86_instruction_emulation/scripts/bin_fuzz
>>> new file mode 100755
>>> index 000000000000..e570b17f9404
>>> --- /dev/null
>>> +++ b/tools/fuzz/x86_instruction_emulation/scripts/bin_fuzz
>>> @@ -0,0 +1,23 @@
>>> +#!/bin/bash
>>> +# SPDX-License-Identifier: GPL-2.0+
>>> +# This runs the afl-harness at $1, $2 times (or 100)
>>> +# It runs uniq and sorts the output to give an idea of what is
>>> causing the
>>> +# most crashes. Useful for deciding what to implement next.
>>> +
>>> +if [ "$#" -lt 1 ]; then
>>> + echo "Usage: './bin_fuzz path_to_afl-harness [number of times to run]"
>>> + exit
>>> +fi
>>> +
>>> +mkdir -p fuzz
>>> +rm -f fuzz/*.in fuzz/*.out
>>> +
>>> +for i in $(seq 1 1 ${2:-100})
>>> +do
>>> + {
>>> + head -c 500 /dev/urandom | tee fuzz/$i.in | ./$1
>>> + } > fuzz/$i.out 2>&1
>>> +
>>> +done
>>> +
>>> +find ./fuzz -name '*.out' -exec tail -1 {} \; | sed 's/.*
>>> Segmen/Segman/' | sed -r 's/^(\s[0-9a-f]{2})+$/misc instruction
>>> output/' | sort | uniq -c | sort -rn
>>
>> What is that Segman thing about?
>>
> This was for binning crashes-- check `tools/fuzz/x86ie/scripts/bin.sh`
> in v2 for the updated version. Basically, it checks whether a
> segmentation fault has happened, and if so, launches a gdb session to
> see whether it was caused by an unimplemented x86_emulator_op. This is
> useful in development for prioritizing the unimplemented features which
> are causing the most fake crashes.
I can see why you want to combine them, but I don't understand where
"Segman" comes from. Where is there a man here?
Alex
Powered by blists - more mailing lists