lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 22 Mar 2017 09:54:26 -0400
From:   Mike Rapoport <rppt@...ux.vnet.ibm.com>
To:     "Michael Kerrisk (man-pages)" <mtk.manpages@...il.com>,
        Andrea Arcangeli <aarcange@...hat.com>
Cc:     lkml <linux-kernel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        linux-man <linux-man@...r.kernel.org>
Subject: Re: Review request: draft ioctl_userfaultfd(2) manual page

Hello Michael,

On Mon, Mar 20, 2017 at 09:11:07PM +0100, Michael Kerrisk (man-pages) wrote:
> Hello Andrea, Mike, and all,
> 
> Mike: here's the split out page that describes the 
> userfaultfd ioctl() operations.
> 
> I'd like to get review input, especially from you and
> Andrea, but also anyone else, for the current version
> of this page, which includes quite a few FIXMEs to be
> sorted.
> 
> I've shown the rendered version of the page below. 
> The groff source is attached, and can also be found
> at the branch here:
> 
> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=draft_userfaultfd
> 
> The new ioctl_userfaultfd(2) page follows this mail.
> 
> Cheers,
> 
> Michael
> 
> NAME
>        userfaultfd - create a file descriptor for handling page faults in user
>        space
> 
> SYNOPSIS
>        #include <sys/ioctl.h>
> 
>        int ioctl(int fd, int cmd, ...);
> 
> DESCRIPTION
>        Various ioctl(2) operations can be performed on  a  userfaultfd  object
>        (created by a call to userfaultfd(2)) using calls of the form:
> 
>            ioctl(fd, cmd, argp);
> 
>        In  the  above,  fd  is  a  file  descriptor referring to a userfaultfd
>        object, cmd is one of the commands listed below, and argp is a  pointer
>        to a data structure that is specific to cmd.
> 
>        The  various  ioctl(2) operations are described below.  The UFFDIO_API,
>        UFFDIO_REGISTER, and UFFDIO_UNREGISTER operations are used to configure
>        userfaultfd behavior.  These operations allow the caller to choose what
>        features will be enabled and what kinds of events will be delivered  to
>        the application.  The remaining operations are range operations.  These
>        operations enable the calling application to resolve page-fault  events
>        in a consistent way.
> 
> 
>        ┌─────────────────────────────────────────────────────┐
>        │FIXME                                                │
>        ├─────────────────────────────────────────────────────┤
>        │Above: What does "consistent" mean?                  │
>        │                                                     │
>        └─────────────────────────────────────────────────────┘

Andrea, can you please help with this one?
 
>    UFFDIO_API
>        (Since Linux 4.3.)  Enable operation of the userfaultfd and perform API
>        handshake.  The argp argument is a pointer to a  uffdio_api  structure,
>        defined as:
> 
>            struct uffdio_api {
>                __u64 api;        /* Requested API version (input) */
>                __u64 features;   /* Must be zero */
>                __u64 ioctls;     /* Available ioctl() operations (output) */
>            };
> 
>        The  api  field  denotes  the API version requested by the application.
>        Before the call, the features field must be initialized to zero.
> 
> 
>        ┌─────────────────────────────────────────────────────┐
>        │FIXME                                                │
>        ├─────────────────────────────────────────────────────┤
>        │Above: Why must the 'features' field be  initialized │
>        │to zero?                                             │
>        └─────────────────────────────────────────────────────┘

Until 4.11 the only supported feature is delegation of missing page fault
and the UFFDIO_FEATURES bitmask is 0.
There's a check in uffdio_api call that the user is not trying to enable
any other functionality and it asserts that uffdio_api.featurs is zero [1].
Starting from 4.11 the features negotiation is different. Now uffdio_call
verifies that it can support features the application requested [2].


>        The  kernel verifies that it can support the requested API version, and
>        sets the features and ioctls fields to bit masks representing  all  the
>        available features and the generic ioctl(2) operations available.  Cur‐
>        rently, zero (i.e., no feature bits) is placed in the  features  field.
>        The returned ioctls field can contain the following bits:
> 
> 
>        ┌─────────────────────────────────────────────────────┐
>        │FIXME                                                │
>        ├─────────────────────────────────────────────────────┤
>        │This  user-space  API  seems not fully polished. Why │
>        │are there not constants defined for each of the bit- │
>        │mask values listed below?                            │
>        └─────────────────────────────────────────────────────┘
> 
>        1 << _UFFDIO_API
>               The UFFDIO_API operation is supported.
> 
>        1 << _UFFDIO_REGISTER
>               The UFFDIO_REGISTER operation is supported.
> 
>        1 << _UFFDIO_UNREGISTER
>               The UFFDIO_UNREGISTER operation is supported.

Well, I tend to agree. I believe the original intention was to use the
OR'ed mask, like UFFD_API_IOCTLS.
Andrea, can you add somthing?

> 
> 
>               ┌─────────────────────────────────────────────────────┐
>               │FIXME                                                │
>               ├─────────────────────────────────────────────────────┤
>               │Is  the above description of the 'ioctls' field cor‐ │
>               │rect?  Does more need to be said?                    │
>               │                                                     │
>               └─────────────────────────────────────────────────────┘

This is correct. I wouldn't add anything else.
 
>        This ioctl(2) operation returns 0 on success.  On error, -1 is returned
>        and  errno  is set to indicate the cause of the error.  Possible errors
>        include:
> 
> 
>        ┌─────────────────────────────────────────────────────┐
>        │FIXME                                                │
G>        ├─────────────────────────────────────────────────────┤
>        │Is the following error list correct?                 │
>        │                                                     │
>        └─────────────────────────────────────────────────────┘

There's also -EFAULT in case copy_{from,to}_user fails.

> 
>        EINVAL The userfaultfd has already been  enabled  by  a  previous  UFF‐
>               DIO_API operation.
> 
>        EINVAL The  API  version requested in the api field is not supported by
>               this kernel, or the features field was not zero.
>
>               ┌─────────────────────────────────────────────────────┐
>               │FIXME                                                │
>               ├─────────────────────────────────────────────────────┤
>               │In the above error case, the  returned  'uffdio_api' │
>               │structure  zeroed out. Why is this done? This should │
>               │be explained in the manual page.                     │
>               │                                                     │
>               └─────────────────────────────────────────────────────┘
 
In my understanding the uffdio_api structure is zeroed to allow the caller
to distinguish the reasons for -EINVAL.

>    UFFDIO_REGISTER
>        (Since Linux 4.3.)  Register a memory  address  range  with  the  user‐
>        faultfd  object.   The  argp argument is a pointer to a uffdio_register
>        structure, defined as:
> 
>            struct uffdio_range {
>                __u64 start;    /* Start of range */
>                __u64 len;      /* Length of rnage (bytes) */
>            };
> 
>            struct uffdio_register {
>                struct uffdio_range range;
>                __u64 mode;     /* Desired mode of operation (input) */
>                __u64 ioctls;   /* Available ioctl() operations (output) */
>            };
> 
> 
>        The range field defines a memory range starting at start and continuing
>        for len bytes that should be handled by the userfaultfd.
> 
>        The  mode  field  defines the mode of operation desired for this memory
>        region.  The following values may be bitwise  ORed  to  set  the  user‐
>        faultfd mode for the specified range:
> 
>        UFFDIO_REGISTER_MODE_MISSING
>               Track page faults on missing pages.
> 
>        UFFDIO_REGISTER_MODE_WP
>               Track page faults on write-protected pages.
> 
>        Currently, the only supported mode is UFFDIO_REGISTER_MODE_MISSING.
> 
>        If the operation is successful, the kernel modifies the ioctls bit-mask
>        field to indicate which ioctl(2) operations are available for the spec‐
>        ified range.  This returned bit mask is as for UFFDIO_API.
> 
>        This ioctl(2) operation returns 0 on success.  On error, -1 is returned
>        and errno is set to indicate the cause of the error.   Possible  errors
>        include:
> 
> 
>        ┌─────────────────────────────────────────────────────┐
>        │FIXME                                                │
>        ├─────────────────────────────────────────────────────┤
>        │Is the following error list correct?                 │
>        │                                                     │
>        └─────────────────────────────────────────────────────┘

Here again it maybe -EFAULT to indicate copy_{from,to}_user failure.
And, UFFDIO_REGISTER may return -ENOMEM if the process is exiting and the
mm_struct has gone by the time userfault grabs it. 
 
>        EBUSY  A  mapping  in  the  specified  range is registered with another
>               userfaultfd object.
> 
>        EINVAL An invalid or unsupported bit was specified in the  mode  field;
>               or the mode field was zero.
> 
>        EINVAL There is no mapping in the specified address range.
> 
>        EINVAL range.start  or  range.len  is not a multiple of the system page
>               size; or, range.len is  zero;  or  these  fields  are  otherwise
>               invalid.
> 
>        EINVAL There as an incompatible mapping in the specified address range.
> 
> 
>               ┌─────────────────────────────────────────────────────┐
>               │FIXME                                                │
>               ├─────────────────────────────────────────────────────┤
>               │Above: What does "incompatible" mean?                │
>               │                                                     │
>               └─────────────────────────────────────────────────────┘

Up to 4.10 userfault context may be registered only for MAP_ANONYMOUS |
MAP_PRIVATE mappings.

>    UFFDIO_UNREGISTER
>        (Since Linux 4.3.)  Unregister a memory address range from userfaultfd.
>        The address range to unregister is specified in the uffdio_range struc‐
>        ture pointed to by argp.
> 
>        This ioctl(2) operation returns 0 on success.  On error, -1 is returned
>        and errno is set to indicate the cause of the error.   Possible  errors
>        include:
> 
>        EINVAL Either  the  start or the len field of the ufdio_range structure
>               was not a multiple of the system page size; or the len field was
>               zero; or these fields were otherwise invalid.
> 
>        EINVAL There as an incompatible mapping in the specified address range.
> 
> 
>               ┌─────────────────────────────────────────────────────┐
>               │FIXME                                                │
>               ├─────────────────────────────────────────────────────┤
>               │Above: What does "incompatible" mean?                │
>               └─────────────────────────────────────────────────────┘

The same comments as for UFFDIO_REGISTER apply here as well.

>        EINVAL There was no mapping in the specified address range.
> 
>    UFFDIO_COPY
>        (Since  Linux 4.3.)  Atomically copy a continuous memory chunk into the
>        userfault registered range and optionally wake up the  blocked  thread.
>        The  source  and  destination addresses and the number of bytes to copy
>        are specified by the src, dst, and len fields of the uffdio_copy struc‐
>        ture pointed to by argp:
> 
>            struct uffdio_copy {
>                __u64 dst;    /* Source of copy */
>                __u64 src;    /* Destinate of copy */
>                __u64 len;    /* Number of bytes to copy */
>                __u64 mode;   /* Flags controlling behavior of copy */
>                __s64 copy;   /* Number of bytes copied, or negated error */
>            };
> 
>        The  following value may be bitwise ORed in mode to change the behavior
>        of the UFFDIO_COPY operation:
> 
>        UFFDIO_COPY_MODE_DONTWAKE
>               Do not wake up the thread that waits for page-fault resolution
> 
>        The copy field is used by the kernel to return the number of bytes that
>        was actually copied, or an error (a negated errno-style value).
> 
> 
>        ┌─────────────────────────────────────────────────────┐
>        │FIXME                                                │
>        ├─────────────────────────────────────────────────────┤
>        │Above:  Why is the 'copy' field used to return error │
>        │values?  This should  be  explained  in  the  manual │
>        │page.                                                │
>        └─────────────────────────────────────────────────────┘

Andrea, can you help with this one, please?

>        If  the  value returned in copy doesn't match the value that was speci‐
>        fied in len, the operation fails with the error EAGAIN.  The copy field
>        is output-only; it is not read by the UFFDIO_COPY operation.
> 
>        This ioctl(2) operation returns 0 on success.  In this case, the entire
>        area was copied.  On error, -1 is returned and errno is set to indicate
>        the cause of the error.  Possible errors include:
> 
>        EAGAIN The number of bytes copied (i.e., the value returned in the copy
>               field) does not equal the value that was specified  in  the  len
>               field.
> 
>        EINVAL Either dst or len was not a multiple of the system page size, or
>               the range specified by src and len or dst and len was invalid.
> 
>        EINVAL An invalid bit was specified in the mode field.
> 
>    UFFDIO_ZEROPAGE
>        (Since Linux 4.3.)  Zero out  a  memory  range  registered  with  user‐
>        faultfd.   The  requested  range is specified by the range field of the
>        uffdio_zeropage structure pointed to by argp:
> 
>            struct uffdio_zeropage {
>                struct uffdio_range range;
>                __u64 mode;     /* Flags controlling behavior of copy */
>                __s64 zeropage; /* Number of bytes zeroed, or negated error */
>            };
> 
>        The following value may be bitwise ORed in mode to change the  behavior
>        of the UFFDIO_ZERO operation:
> 
>        UFFDIO_ZEROPAGE_MODE_DONTWAKE
>               Do not wake up the thread that waits for page-fault resolution.
> 
>        The  zeropage field is used by the kernel to return the number of bytes
>        that was actually zeroed, or an  error  in  the  same  manner  as  UFF‐
>        DIO_COPY.
> 
> 
>        ┌─────────────────────────────────────────────────────┐
>        │FIXME                                                │
>        ├─────────────────────────────────────────────────────┤
>        │Why  is  the  'zeropage'  field used to return error │
>        │values?  This should  be  explained  in  the  manual │
>        │page.                                                │
>        └─────────────────────────────────────────────────────┘
>        If  the  value  returned  in the zeropage field doesn't match the value
>        that was specified in range.len, the operation  fails  with  the  error
>        EAGAIN.   The zeropage field is output-only; it is not read by the UFF‐
>        DIO_ZERO operation.
> 
>        This ioctl(2) operation returns 0 on success.  In this case, the entire
>        area was zeroed.  On error, -1 is returned and errno is set to indicate
>        the cause of the error.  Possible errors include:
> 
>        EAGAIN The number of bytes zeroed (i.e.,  the  value  returned  in  the
>               zeropage  field)  does not equal the value that was specified in
>               the range.len field.
> 
>        EINVAL Either range.start or range.len was not a multiple of the system
>               page  size;  or  range.len  was zero; or the range specified was
>               invalid.
> 
>        EINVAL An invalid bit was specified in the mode field.
> 
>    UFFDIO_WAKE
>        (Since Linux 4.3.)  Wake up the thread waiting for  page-fault  resolu‐
>        tion  on  a  specified  memory  address  range.  The argp argument is a
>        pointer to a uffdio_range structure (shown above)  that  specifies  the
>        address range.
> 
> 
>        ┌─────────────────────────────────────────────────────┐
>        │FIXME                                                │
>        ├─────────────────────────────────────────────────────┤
>        │Need more detail here. How is the UFFDIO_WAKE opera‐ │
>        │tion used?                                           │
>        └─────────────────────────────────────────────────────┘

The UFFDIO_WAKE operation is used in conjunction with
UFFDIO_{COPY,ZEROPAGE} operations that have
UFFDIO_{COPY,ZEROPAGE}_MODE_DONTWAKE bit set in the mode field.
The userfault monitor can perform several UFFDIO_{COPY,ZEROPAGE} calls in a
batch and then explicitly wake up the faulting thread using UFFDIO_WAKE.

>        This ioctl(2) operation returns 0 on success.  On error, -1 is returned
>        and  errno  is set to indicate the cause of the error.  Possible errors
>        include:
> 
>        EINVAL The start or the len field of the ufdio_range structure was  not
>               a  multiple  of  the  system  page size; or len was zero; or the
>               specified range was otherwise invalid.
> 
> RETURN VALUE
>        See descriptions of the individual operations, above.
> 
> ERRORS
>        See descriptions of the individual operations, above.  In addition, the
>        following  general errors can occur for all of the operations described
>        above:
> 
>        EFAULT argp does not point to a valid memory address.
> 
>        EINVAL (For all operations except UFFDIO_API.)  The userfaultfd  object
>               has not yet been enabled (via the UFFDIO_API operation).
> 
> CONFORMING TO
>        These ioctl(2) operations are Linux-specific.
> 
> EXAMPLE
>        See userfaultfd(2).
> 
> SEE ALSO
>        ioctl(2), mmap(2), userfaultfd(2)
> 
>        Documentation/vm/userfaultfd.txt in the Linux kernel source tree
> 

[1] http://lxr.free-electrons.com/source/fs/userfaultfd.c#L1199
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/userfaultfd.c#n1680

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ