linux-kernel - Re: [PATCH 2/3] Emulate simple x86 instructions in userspace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <c5fbed80-5933-eca3-001e-0e2aaccfcd1d@amazon.com>
Date:   Fri, 21 Jun 2019 15:28:34 +0200
From:   Alexander Graf <graf@...zon.com>
To:     <samcacc@...zon.com>, Sam Caccavale <samcacc@...zon.de>
CC:     <samcaccavale@...il.com>, <nmanthey@...zon.de>,
        <wipawel@...zon.de>, <dwmw@...zon.co.uk>, <mpohlack@...zon.de>,
        <graf@...zon.de>, <karahmed@...zon.de>,
        <andrew.cooper3@...rix.com>, <JBeulich@...e.com>,
        <pbonzini@...hat.com>, <rkrcmar@...hat.com>, <tglx@...utronix.de>,
        <mingo@...hat.com>, <bp@...en8.de>, <hpa@...or.com>,
        <paullangton4@...il.com>, <anirudhkaushik@...gle.com>,
        <x86@...nel.org>, <kvm@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/3] Emulate simple x86 instructions in userspace


On 12.06.19 17:19, samcacc@...zon.com wrote:
> On 5/31/19 10:38 AM, Alexander Graf wrote:
>> On 21.05.19 17:39, Sam Caccavale wrote:
>>
>>> +static void dump_state_after(const char *desc, struct state *state)
>>> +{
>>> +    debug(" -- State after %s --\n", desc);
>>> +    debug("mode: %s\n", x86emul_mode_string[state->ctxt.mode]);
>>> +    debug(" cr0: %lx\n", state->vcpu.cr[0]);
>>> +    debug(" cr3: %lx\n", state->vcpu.cr[3]);
>>> +    debug(" cr4: %lx\n", state->vcpu.cr[4]);
>>> +
>>> +    debug("Decode _eip: %lu\n", state->ctxt._eip);
>>> +    debug("Emulate eip: %lu\n", state->ctxt.eip);
>>> +
>>> +    debug("\n");
>>>    }
>>>      int step_emulator(struct state *state)
>>>    {
>>> -    return 0;
>>> +    int rc, prev_eip = state->ctxt.eip;
>>> +    int decode_size = state->data_available - decode_offset;
>>> +
>>> +    if (decode_size < 15) {
>>> +        rc = x86_decode_insn(&state->ctxt, &state->data[decode_offset],
>>> +                     decode_size);
>>> +    } else {
>>> +        rc = x86_decode_insn(&state->ctxt, NULL, 0);
>>
>> Isn't this going to fetch instructions from data as well? Why do we need
>> the < 15 special case at all?
>>
> I've changed the method of acquiring data in v2, but the 15 limit is
> still relevant.  If x86_decode_insn is called with a NULL pointer and
> instruction size 0, the bytes are fetched via the emulator_ops.fetch
> function.  This would be nice, but there is no way of limiting how many
> bytes it will try and fetch-- and it usually grabs 15 since that is the
> longest x86 instruction (as of yet?).  When there are less than 15 bytes
> left, limiting the fetch size to the remaining bytes is important.


You want to at least add a comment here, detailing the fact that where 
the magic 15 comes from and that you want to exercise the normal 
prefetch path while still allowing the buffer to shrink < 15 bytes :). 
Maybe move MAX_INST_SIZE from svm.c into a .h file and reuse that while 
at it.


[...]


>>> diff --git a/tools/fuzz/x86_instruction_emulation/scripts/bin_fuzz
>>> b/tools/fuzz/x86_instruction_emulation/scripts/bin_fuzz
>>> new file mode 100755
>>> index 000000000000..e570b17f9404
>>> --- /dev/null
>>> +++ b/tools/fuzz/x86_instruction_emulation/scripts/bin_fuzz
>>> @@ -0,0 +1,23 @@
>>> +#!/bin/bash
>>> +# SPDX-License-Identifier: GPL-2.0+
>>> +# This runs the afl-harness at $1, $2 times (or 100)
>>> +# It runs uniq and sorts the output to give an idea of what is
>>> causing the
>>> +# most crashes.  Useful for deciding what to implement next.
>>> +
>>> +if [ "$#" -lt 1 ]; then
>>> +  echo "Usage: './bin_fuzz path_to_afl-harness [number of times to run]"
>>> +  exit
>>> +fi
>>> +
>>> +mkdir -p fuzz
>>> +rm -f fuzz/*.in fuzz/*.out
>>> +
>>> +for i in $(seq 1 1 ${2:-100})
>>> +do
>>> +  {
>>> +  head -c 500 /dev/urandom | tee fuzz/$i.in | ./$1
>>> +  } > fuzz/$i.out 2>&1
>>> +
>>> +done
>>> +
>>> +find ./fuzz -name '*.out' -exec tail -1 {} \; | sed 's/.*
>>> Segmen/Segman/' | sed -r 's/^(\s[0-9a-f]{2})+$/misc instruction
>>> output/' | sort | uniq -c | sort -rn
>>
>> What is that Segman thing about?
>>
> This was for binning crashes-- check `tools/fuzz/x86ie/scripts/bin.sh`
> in v2 for the updated version.  Basically, it checks whether a
> segmentation fault has happened, and if so, launches a gdb session to
> see whether it was caused by an unimplemented x86_emulator_op.  This is
> useful in development for prioritizing the unimplemented features which
> are causing the most fake crashes.


I can see why you want to combine them, but I don't understand where 
"Segman" comes from. Where is there a man here?



Alex