lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 14 Dec 2016 20:48:37 +0100
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     "Levin, Alexander" <alexander.levin@...izon.com>
Cc:     "tglx@...utronix.de" <tglx@...utronix.de>,
        "scientist@...com" <scientist@...com>,
        "glider@...gle.com" <glider@...gle.com>,
        "andreyknvl@...gle.com" <andreyknvl@...gle.com>,
        "rostedt@...dmis.org" <rostedt@...dmis.org>,
        "arnd@...db.de" <arnd@...db.de>,
        "mathieu.desnoyers@...icios.com" <mathieu.desnoyers@...icios.com>,
        "daniel.vetter@...ll.ch" <daniel.vetter@...ll.ch>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [RFC 1/3] abi_spec: basic definitions of constraints, args and syscalls

On Wed, Dec 14, 2016 at 8:46 PM, Dmitry Vyukov <dvyukov@...gle.com> wrote:
> On Mon, Dec 12, 2016 at 11:45 AM, Dmitry Vyukov <dvyukov@...gle.com> wrote:
>> On Mon, Dec 12, 2016 at 11:29 AM, Dmitry Vyukov <dvyukov@...gle.com> wrote:
>>> On Wed, Nov 23, 2016 at 3:59 PM,  <alexander.levin@...izon.com> wrote:
>>>> On Mon, Nov 21, 2016 at 03:48:17PM +0100, Dmitry Vyukov wrote:
>>>>> Several observations based on my experience with syzkaller descriptions:
>>>>>  - there are 2 levels: physical and logical;
>>>>>    on physical level there are int, pointer, array, struct, union;
>>>>>    and that's pretty much it.
>>>>>    on logical level there are flags, bitmasks, file paths, sctp socket fds,
>>>>>    unix socket names, etc.
>>>>>    These levels are almost completely orthogonal. It would be useful to
>>>>>    clearly separate them on description level. E.g. now you have TYPE_PTR and
>>>>>    TYPE_INT which is physical level; and then TYPE_FD which is also an int.
>>>>>
>>>>>  - logical types won't fit into 64 bits, there are more of them
>>>>
>>>> I agree with your two points above.
>>>>
>>>> As an example, let's look at:
>>>>
>>>>         int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
>>>>
>>>> epfd would be a physical int, logical "epoll_fd", and constrainted with being
>>>> an open descriptor.
>>>>
>>>> fd on the other hand is tricky: it's a physical int, logical fd, and
>>>> constrainted with being a file descriptor that supports poll, and being
>>>> open.
>>>>
>>>> So while I think that logical types can be just a counter rather than a bitmask
>>>> I suspect that our constraints won't fit into 64 bits. Make 2 64 bit fields?
>>>
>>> One observation is that there are just 5 physical types:
>>>  - scalar
>>>  - pointer
>>>  - array
>>>  - struct
>>>  - union
>>>
>>> The rest deals with what exactly "scalar" is in a particular case.
>>>
>>> I don't yet have complete answer, as it somewhat intermixed with the
>>> rest of questions.
>>>
>>>
>>>
>>>>>  - we need support for recursive types (yes, there are linked lists in
>>>>> kernel APIs)
>>>>
>>>> I imagine that this will be handled by specific logical type handlers we'll
>>>> need to implement. Can you give me an example and I'll try to code that?
>>>
>>> One example is te_oper_param here:
>>> https://android.googlesource.com/kernel/tegra/+/android-tegra-3.10/security/tlk_driver/ote_protocol.h
>>> next_ptr_user is a pointer to te_oper_param. Thus recursive definition.
>>>
>>> Another example is snd_seq_ev_quote:
>>> http://lxr.free-electrons.com/source/include/uapi/sound/asequencer.h#L194
>>> it contains struct snd_seq_event *event and snd_seq_event recursively
>>> contains snd_seq_ev_quote.
>>>
>>> In all cases it is pointer recursion via structs.
>>>
>>> Sometimes it wish that developers have to write formal descriptions in
>>> a limited language upfront. That would probably eliminate lots of
>>> weird one-off "see what I invented here" cases :)
>>>
>>>
>>>
>>>
>>>>>  - we need support for input/output data
>>>>>    currently syzkaller does this only on pointer level, i.e. you
>>>>> attach direction to pointer target
>>>>>    but that's not enough, frequently there is a struct where one field
>>>>> is input and another is output
>>>>
>>>> Assuming it's "data", for intput we'll just need to check that the given
>>>> length is readable and for output that the length is writable, no?
>>>
>>> It also can be an fd in a struct field. If it's output (e.g. pipe),
>>> then we must not check that it's valid on entry. But we may want to
>>> check that it's valid on successful exit, or fuzzer will use these
>>> output fd's as inputs to other calls.
>>>
>>>
>>>> We can do it with constraints right now.
>>>>
>>>>>  - we may need support for reusing types in several arguments
>>>>>    e.g. you may have a pretty complex type, and you don't want to
>>>>> write it out a dozen of times
>>>>
>>>> Yup, so if we go with the physical/logical split we can have handlers for
>>>> logical types.
>>>>
>>>>>  - we need some support for discriminated syscalls
>>>>>    if we want to support strace usecase, the support needs to be more
>>>>> extensive than what syzkaller has;
>>>>>    i.e. syzkaller can't restore discrimination having actual argument
>>>>> values (it can do it only in the other direction)
>>>>>
>>>>>  - I would not create a special support for arguments;
>>>>>    rather I would create support for structs and struct fields,
>>>>>    and then pretend that a syscalls effectively accepts a struct by value
>>>>
>>>> But that means I need a custom handler for every syscall to parse the
>>>> struct fields rather than a generic code that goes through the args and calls
>>>> the right handler?
>>>
>>> No, you don't. We will need generic code that parses a piece of memory
>>> as a struct and splits it into fields anyway.
>>> We can just reuse this code to handle syscall arguments as follows.
>>> Describe syscall arguments as a pseudo struct (array of fields). Then
>>> syscall handling function accepts pointer to region of memory with
>>> arguments and description of the struct, and invokes common struct
>>> handling code.
>>>
>>>
>>>
>>>>> How would you like us to collaborate on this?
>>>>> If you share your git repo, I could form it into something that would
>>>>> be suitable for syzkaller and incorporate most of the above.
>>>>
>>>> I'd really like to have something that either generates these descriptions from
>>>> your DSL (it really doesn't have to be perfect (at first)) or something that
>>>> generates DSL from these C structs.
>>>
>>> Do you mean generating C from my DSL of a one-off or as a permanent solution?
>>
>>
>> Main problem I am trying to resolve now is how to make types reference
>> other types.
>> Say we have "pointer to pointer to int" (the same applies to structs
>> and arrays). This means that struct type must be able to reference
>> another instance of struct type. But we can't put type into type by
>> value. Namely, the following is not possible:
>>
>> struct type {
>>   int kind;
>>   union {
>>     ...
>>     // for kind = KIND_PTR
>>     struct {
>>       struct type type;
>>     } ptr;
>>   };
>> };
>>
>> One obvious solution would be to always reference _pointers_ to types. E.g.:
>>
>>     // for kind = KIND_PTR
>>     struct {
>>       struct type* type;
>>     } ptr;
>>
>> This works but leads to super-verbose descriptions (if we want to use
>> static tables), because we need to describe type of each syscall
>> argument and inner types for all pointers/arrays/structs/unions as
>> separate global variables:
>>
>>
>> static struct type open_arg0_inner {
>>   .kind = KIND_ARRAY,
>>   ...
>> };
>>
>> static struct type open_arg0 {
>>   .kind = KIND_PTR,
>>   .ptr.type = &open_arg0_inner
>>   ...
>> };
>>
>> static struct type open_arg1 {
>>   .kind = KIND_SCALAR,
>>   ...
>> };
>>
>> static struct type open_arg2 {
>>   .kind = KIND_SCALAR,
>>   ...
>> };
>>
>> static struct type open_ret {
>>   .kind = KIND_SCALAR,
>>   ...
>> };
>>
>> struct syscall_spec syscall_spec_open = {
>>   .name = "open",
>>   .retval = {
>>     .name = "retval",
>>     .type = &open_ret,
>>    }
>>   .nargs = 3,
>>   .args[0] = {
>>     .name = "pathname",
>>     .type = &open_arg0,
>>   }
>>   .args[1] = {
>>     .name = "flags",
>>     .type = &open_arg1,
>>   }
>>   .args[2] = {
>>     .name = "mode",
>>     .type = &open_arg2,
>>   }
>> };
>>
>>
>> This looks way too verbose provided that we write these descriptions
>> by hand (just coming up with consistent unique names for all these
>> global vars is a problem).
>> If we generate that from DSL, it can work though.
>>
>> Another option I am considering is to use helper construction
>> functions that return pointers to types, e.g. something along these
>> lines:
>>
>> struct syscall_spec syscall_spec_open = {
>>   .name = "open",
>>   .retval = {
>>     .name = "retval",
>>     .type = type_scalar(FD | ERRNO),
>>    }
>>   .nargs = 3,
>>   .args[0] = {
>>     .name = "pathname",
>>     .type = type_ptr(DIR_IN, type_array(PATHNAME)),
>>   }
>>   .args[1] = {
>>     .name = "flags",
>>     .type = type_scalar_flags(O_RDONLY | O_WRONLY, ....),
>>   }
>>   .args[2] = {
>>     .name = "mode",
>>     .type = type_scalar_flags(S_IRWXU ....),
>>   }
>> };
>>
>> Now these table can only be initialized dynamically. And we will
>> probably need lots of these helper functions. So I don't like it
>> either...
>>
>> Any suggestions?
>
>
>
> Here is my current prototype:
> https://github.com/dvyukov/linux/commit/6200a9333e78bef393f8ead41205813b94d340f3
>
> For now it can trace arguments of 4 system calls:
>
> [    4.055483] [pid 1258] open(&00007ffdefc023a0=[], 0x0, 0x1b6)
> [    4.055991] [pid 1258] open(&00007ffdefc023a0=[], 0x0, 0x1b6) = 3
> [    4.056486] [pid 1258] read(0x3, &00007ffdefc01320=[], 0x1000)
> [    4.056977] [pid 1258] read(0x3, &00007ffdefc01320=[], 0x1000) = 1780
> [    4.057485] [pid 1258] read(0x3, &00007f316a732000=[], 0x1000)
> [    4.057991] [pid 1258] read(0x3, &00007f316a732000=[], 0x1000) = 0
> [    4.058488] [pid 1258] close(0x0) = 0
> [    4.058848] [pid 1258] write(0x1, &00007f316a732000=[], 0x5)
> [    4.059304] [pid 1258] write(0x1, &00007f316a732000=[], 0x5) = 5
> [    4.059905] [pid 1234] close(0x0) = 0
> [    4.060239] [pid 1234] close(0x0) = 0
>
>
> Main outstanding problems:
>  - understanding length of arrays and buffers
>  - understanding discriminators of unions and syscall variations



Here is open description:

struct syscall_spec syscall_spec_open = {
        .name = "open",
        .errno = {EACCES, EDQUOT, EEXIST, EFAULT, EFBIG, EINTR, EINVAL, EISDIR,
                ELOOP, EMFILE, ENAMETOOLONG, ENFILE, ENODEV, ENOENT, ENOMEM,
                ENOSPC, ENOTDIR, ENXIO, EOVERFLOW, EPERM, EROFS, ETXTBSY,
                EWOULDBLOCK},
        .ret = &type_fd,
        .args = {
                {"pathname", &type_ptr_pathname},
                {"flags", &type_open_flags},
                {"mode", &type_open_mode},
        },
};

Types are declares outline as:

static struct type type_open_mode = {
        .kind = KIND_SCALAR,
        .scalar.size = 4,
        .scalar.constraint = CONSTRAINT_BITMASK,
        .scalar.bitmask = {$(S_IRUSR), $(S_IWUSR), $(S_IXUSR),
                $(S_IRGRP), $(S_IWGRP), $(S_IXGRP),
                $(S_IROTH), $(S_IWOTH), $(S_IXOTH)},
};

static struct type type_fd = {
        .kind = KIND_RESOURCE,
        .res.res = RES_FD,
        .res.type = &type_i32,
};

static struct type type_array_i8 = {
        .kind = KIND_ARRAY,
        .array.type = &type_i8,
};

static struct type type_ptr_array_i8 = {
        .kind = KIND_PTR,
        .ptr.type = &type_array_i8,
};

etc

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ