[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <82e9619b-0e6d-ae10-28c8-87295ae85389@redhat.com>
Date: Sat, 12 Aug 2017 23:56:22 +0200
From: Hans de Goede <hdegoede@...hat.com>
To: Arnd Bergmann <arnd@...db.de>,
Michael Thayer <michael.thayer@...cle.com>,
"Knut St . Osmundsen" <knut.osmundsen@...cle.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Alexander Viro <viro@...iv.linux.org.uk>,
Larry Finger <Larry.Finger@...inger.net>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Linux FS-devel Mailing List <linux-fsdevel@...r.kernel.org>
Subject: Re: [RFC 1/2] misc: Add vboxguest driver for Virtual Box Guest
integration
Hi,
On 11-08-17 23:23, Arnd Bergmann wrote:
> On Fri, Aug 11, 2017 at 3:23 PM, Hans de Goede <hdegoede@...hat.com> wrote:
>> This commit adds a driver for the Virtual Box Guest PCI device used in
>> Virtual Box virtual machines. Enabling this driver will add support for
>> Virtual Box Guest integration features such as copy-and-paste, seamless
>> mode and OpenGL pass-through.
>>
>> This driver also offers vboxguest IPC functionality which is needed
>> for the vboxfs driver which offers folder sharing support.
>>
>> Signed-off-by: Hans de Goede <hdegoede@...hat.com>
>
> Since you are still working on coding style cleanups, I'll comment mainly on
> the ioctl interface for now.
Thank you for the taking the time to review the ioctl interface.
> Overall it already looks much better than
> I expected
> from previous experience with the vbox source.
I'm glad you like it, cleaning it up has been quite a bit of work,
so I'm happy to see this being appreciated.
> Most of the issues I would normally comment on are already moot based
> on the assumption that we won't be able to make substantial changes to
> either the user space portion or the hypervisor side.
>
>> +/**
>> + * Inflate the balloon by one chunk.
>> + *
>> + * The caller owns the balloon mutex.
>> + *
>> + * @returns VBox status code
>> + * @param gdev The Guest extension device.
>> + * @param chunk_idx Index of the chunk.
>> + */
>> +static int vbg_balloon_inflate(PVBOXGUESTDEVEXT gdev, u32 chunk_idx)
>> +{
>> + VMMDevChangeMemBalloon *req = gdev->mem_balloon.change_req;
>> + struct page **pages;
>> + int i, rc;
>> +
>> + pages = kmalloc(sizeof(*pages) * VMMDEV_MEMORY_BALLOON_CHUNK_PAGES,
>> + GFP_KERNEL | __GFP_NOWARN);
>> + if (!pages)
>> + return VERR_NO_MEMORY;
>> +
>> + req->header.size = sizeof(*req);
>> + req->inflate = true;
>> + req->pages = VMMDEV_MEMORY_BALLOON_CHUNK_PAGES;
>
> The balloon section seems to be rather simplistic with its ioctl interface.
> Ideally this should use CONFIG_BALLOON_COMPACTION and an
> oom notifier like the virtio_balloon driver does. Yes, I realize that only
> one of the six (or more) balloon drivers we have in the kernel does it ;-)
The way the balloon code works is that the baloon-size is fully controlled
by the host, all the guest-additions do is wait for an event indicating that
the host wants to change it and then call this ioctl which will get the
desired balloon size from the host and tries to adjust the size accordingly.
Hooking into the oom killer is not really useful because there is no way we
can indicate to the host we are running low on mem and I don't think that
trying to shrink the balloon to be smaller then the host requested size
is a good idea.
>
>> +/**
>> + * Callback for heartbeat timer.
>> + */
>> +static void vbg_heartbeat_timer(unsigned long data)
>> +{
>> + PVBOXGUESTDEVEXT gdev = (PVBOXGUESTDEVEXT)data;
>> +
>> + vbg_req_perform(gdev, gdev->guest_heartbeat_req);
>> + mod_timer(&gdev->heartbeat_timer,
>> + msecs_to_jiffies(gdev->heartbeat_interval_ms));
>> +}
>
> This looks like a watchdog and should use the drivers/watchdog
> subsystem interfaces.
This is not really a watchdog, this also is fully under host control,
the host controls if the guest should send a heartbeat and if so at
which interval, where normally with a watchdog starting the timer
and deciding the interval is under control of the watchdog user.
Also if we miss to generate the heartbeat then the host will just
log a message, not reset the system as one would expect with a
watchdog. TL;DR: I understand why suggest this but the watchdog
interface seems a poor match for this.
>
>> +/**
>> + * vbg_query_host_version try get the host feature mask and version information
>> + * (vbg_host_version).
>> + *
>> + * @returns 0 or negative errno value (ignored).
>> + * @param gdev The Guest extension device.
>> + */
>> +static int vbg_query_host_version(PVBOXGUESTDEVEXT gdev)
>> +{
>> + VMMDevReqHostVersion *req;
>> + int rc, ret;
>> +
>> + req = vbg_req_alloc(sizeof(*req), VMMDevReq_GetHostVersion);
>> + if (!req)
>> + return -ENOMEM;
>
> While I understand you want to keep the ioctl interface, the version information
> looks like something that we should also have in sysfs in an appropriate
> location.
Yes I can add a sysfs attribute for the version and flags :)
>> diff --git a/drivers/misc/vboxguest/vboxguest_linux.c b/drivers/misc/vboxguest/vboxguest_linux.c
>> new file mode 100644
>> index 000000000000..8468c7139b98
>> --- /dev/null
>> +++ b/drivers/misc/vboxguest/vboxguest_linux.c
>
> This looks like a fairly short file, and could be merged into the core file.
I would prefer to keep it separate to keep more or less 1 : 1 mapping
between mainline kernel and vbox upstream svn files.
>> +/** The file_operations structures. */
>> +static const struct file_operations vbg_misc_device_fops = {
>> + .owner = THIS_MODULE,
>> + .open = vbg_misc_device_open,
>> + .release = vbg_misc_device_close,
>> + .unlocked_ioctl = vgdrvLinuxIOCtl,
>> +};
>> +static const struct file_operations vbg_misc_device_user_fops = {
>> + .owner = THIS_MODULE,
>> + .open = vbg_misc_device_user_open,
>> + .release = vbg_misc_device_close,
>> + .unlocked_ioctl = vgdrvLinuxIOCtl,
>> +};
>
> These lack a .compat_ioctl callback.
Good catch, so I guess that the code paths for 32 bit OpenGL pass
on 64 bits have never been used / tested under Linux, I will take
a look at this.
>> +/**
>> + * Device I/O Control entry point.
>> + *
>> + * @returns -ENOMEM or -EFAULT for errors inside the ioctl callback; 0
>> + * on success, or a positive VBox status code on vbox guest-device errors.
>> + *
>> + * @param pInode Associated inode pointer.
>> + * @param pFilp Associated file pointer.
>> + * @param uCmd The function specified to ioctl().
>> + * @param ulArg The argument specified to ioctl().
>> + */
>> +static long vgdrvLinuxIOCtl(struct file *pFilp, unsigned int uCmd,
>> + unsigned long ulArg)
>> +{
>> + PVBOXGUESTSESSION session = (PVBOXGUESTSESSION) pFilp->private_data;
>> + u32 cbData = _IOC_SIZE(uCmd);
>> + void *pvBufFree;
>> + void *pvBuf;
>> + int rc, ret = 0;
>> + u64 au64Buf[32 / sizeof(u64)];
>> +
>> + /*
>> + * For small amounts of data being passed we use a stack based buffer
>> + * except for VMMREQUESTs where the data must not be on the stack.
>> + */
>> + if (cbData <= sizeof(au64Buf) &&
>> + VBOXGUEST_IOCTL_STRIP_SIZE(uCmd) !=
>> + VBOXGUEST_IOCTL_STRIP_SIZE(VBOXGUEST_IOCTL_VMMREQUEST(0))) {
>> + pvBufFree = NULL;
>> + pvBuf = &au64Buf[0];
>> + } else {
>> + /* __GFP_DMA32 for VBOXGUEST_IOCTL_VMMREQUEST */
>> + pvBufFree = pvBuf = kmalloc(cbData, GFP_KERNEL | __GFP_DMA32);
>> + if (!pvBuf)
>> + return -ENOMEM;
>> + }
>> + if (copy_from_user(pvBuf, (void *)ulArg, cbData) == 0) {
>
> There should be a check for a maximum argument size.
Good point.
> I'd also change the commands
> to not always do both read and write, but only whichever applies. This function
> would then do the copy_from_user/copy_to_user conditionally.
This is actually an upstream TODO item I believe. One which would
probably be best fixed by using _IOR/_IOW/_IORW IOCTL macros, but
that needs to be coordinated with upstream.
>> +static int hgcm_call_preprocess_linaddr(const HGCMFunctionParameter *src_parm,
>> + bool is_user, void **bounce_buf_ret,
>> + size_t *pcb_extra)
>> +{
>> + void *buf, *bounce_buf;
>> + bool copy_in;
>> + u32 len;
>> + int ret;
>> +
>> + buf = (void *)src_parm->u.Pointer.u.linearAddr;
>> + len = src_parm->u.Pointer.size;
>> + copy_in = src_parm->type != VMMDevHGCMParmType_LinAddr_Out;
>> +
>> + if (!is_user) {
>> + if (WARN_ON(len > VBGLR0_MAX_HGCM_KERNEL_PARM))
>> + return VERR_OUT_OF_RANGE;
>> +
>> + hgcm_call_inc_pcb_extra(buf, len, pcb_extra);
>> + return VINF_SUCCESS;
>> + }
>> +
>> + if (len > VBGLR0_MAX_HGCM_USER_PARM)
>> + return VERR_OUT_OF_RANGE;
>> +
>> + if (len <= PAGE_SIZE * 2)
>> + bounce_buf = kmalloc(len, GFP_KERNEL);
>> + else
>> + bounce_buf = vmalloc(len);
>> +
>
> we have kvmalloc() for this now
Cool I did not know that, will fix.
>
>
>> +#ifdef CONFIG_X86_64
>> +int vbg_hgcm_call32(VBOXGUESTDEVEXT *gdev, VBoxGuestHGCMCallInfo * pCallInfo,
>> + u32 cbCallInfo, u32 timeout_ms, bool is_user)
>> +{
>> + VBoxGuestHGCMCallInfo *pCallInfo64 = NULL;
>> + HGCMFunctionParameter *pParm64 = NULL;
>> + HGCMFunctionParameter32 *pParm32 = NULL;
>> + u32 cParms = pCallInfo->cParms;
>> + u32 iParm;
>> + int rc = VINF_SUCCESS;
>> +
>> + /*
>> + * The simple approach, allocate a temporary request and convert the parameters.
>> + */
>> + pCallInfo64 = kzalloc(sizeof(*pCallInfo64) +
>> + cParms * sizeof(HGCMFunctionParameter),
>> + GFP_KERNEL);
>> + if (!pCallInfo64)
>> + return VERR_NO_MEMORY;
>> +
>> + *pCallInfo64 = *pCallInfo;
>> + pParm32 = VBOXGUEST_HGCM_CALL_PARMS32(pCallInfo);
>> + pParm64 = VBOXGUEST_HGCM_CALL_PARMS(pCallInfo64);
>> + for (iParm = 0; iParm < cParms; iParm++, pParm32++, pParm64++) {
>> + switch (pParm32->type) {
>> + case VMMDevHGCMParmType_32bit:
>> + pParm64->type = VMMDevHGCMParmType_32bit;
>> + pParm64->u.value32 = pParm32->u.value32;
>> + break;
>> +
>> + case VMMDevHGCMParmType_64bit:
>> + pParm64->type = VMMDevHGCMParmType_64bit;
>> + pParm64->u.value64 = pParm32->u.value64;
>> + break;
>> +
>> + case VMMDevHGCMParmType_LinAddr_Out:
>> + case VMMDevHGCMParmType_LinAddr:
>> + case VMMDevHGCMParmType_LinAddr_In:
>> + pParm64->type = pParm32->type;
>> + pParm64->u.Pointer.size = pParm32->u.Pointer.size;
>> + pParm64->u.Pointer.u.linearAddr =
>> + pParm32->u.Pointer.u.linearAddr;
>> + break;
>
> It would be good to clean this up and always use the same structure here.
The HGCMFunctionParameter struct describes function-parameters
in a Host-Guest-Control-Method function call, so this is part of the
host ABI interface and we cannot change this.
Any 32 bit apps running on a 64 bit kernel will be using the 32 bit
version of this struct and we need to translate.
>
>> diff --git a/include/linux/vbox_err.h b/include/linux/vbox_err.h
>> new file mode 100644
>> index 000000000000..906ff7d2585d
>> --- /dev/null
>> +++ b/include/linux/vbox_err.h
>> @@ -0,0 +1,6 @@
>> +#ifndef __VBOX_ERR_H__
>> +#define __VBOX_ERR_H__
>> +
>> +#include <uapi/linux/vbox_err.h>
>> +
>> +#endif
>> diff --git a/include/linux/vbox_ostypes.h b/include/linux/vbox_ostypes.h
>> new file mode 100644
>> index 000000000000..ea2a391f135f
>> --- /dev/null
>> +++ b/include/linux/vbox_ostypes.h
>> @@ -0,0 +1,6 @@
>> +#ifndef __VBOX_OSTYPES_H__
>> +#define __VBOX_OSTYPES_H__
>> +
>> +#include <uapi/linux/vbox_ostypes.h>
>> +
>> +#endif
>> diff --git a/include/linux/vboxguest.h b/include/linux/vboxguest.h
>> new file mode 100644
>> index 000000000000..fca5d199a884
>> --- /dev/null
>> +++ b/include/linux/vboxguest.h
>> @@ -0,0 +1,6 @@
>> +#ifndef __VBOXGUEST_H__
>> +#define __VBOXGUEST_H__
>> +
>> +#include <uapi/linux/vboxguest.h>
>> +
>> +#endif
>
> This should not be needed, we add -I${srctree}/include/uapi/ to the include path
> already.
Ok.
>
>> +/**
>> + * The layout of VMMDEV RAM region that contains information for guest.
>> + */
>> +typedef struct VMMDevMemory {
>> + /** The size of this structure. */
>> + u32 u32Size;
>> + /** The structure version. (VMMDEV_MEMORY_VERSION) */
>> + u32 u32Version;
>> +
>> + union {
>> + struct {
>> + /** Flag telling that VMMDev has events pending. */
>> + bool fHaveEvents;
>> + } V1_04;
>> +
>
> As this is a hardware interface, maybe use u32 instead of 'bool'.
I guess that should work here, since we only every do "if (fHaveEvents)"
but in other cases where we also set bool variables in the hardware
interface structs we also need to know which bit(s) get set on true
to move this over to non bool use.
I agree that the choice of bool in a (virtual) hardware interface
is unfortunate, but we should not be trying to change the (virtual)
hardware definition IMHO.
> Strictly speaking, the ring buffer structures would even have to
> be __le32 and use byteorder specific read/write access, but that
> could perhaps be skipped here when you add a Kconfig
> dependency on !CONFIG_CPU_BIG_ENDIAN (which won't
> ever be set on x86).
>
>> diff --git a/include/uapi/linux/vbox_err.h b/include/uapi/linux/vbox_err.h
>> new file mode 100644
>> index 000000000000..e6e7ba835e36
>> --- /dev/null
>> +++ b/include/uapi/linux/vbox_err.h
>> diff --git a/include/uapi/linux/vbox_ostypes.h b/include/uapi/linux/vbox_ostypes.h
>> new file mode 100644
>> index 000000000000..abe9a38ebfbd
>> --- /dev/null
>> +++ b/include/uapi/linux/vbox_ostypes.h
>> @@ -0,0 +1,158 @@
>
> Some of these headers are not really ABI between the kernel and user
> space but are between the vbox host and guest, so include/uapi is maybe
> not the best place for them.
I assume you refer mainly to vbox_vmmdev.h here, the problem is that
the ioctl API leaks a lot of vbox_vmmdev.h things through
VBOXGUEST_IOCTL_VMMREQUEST which allows userspace to submit arbitrary
vmmdev-requests. Note that the vboxguest driver does do several sanity
checks on these, including limiting them to a specific list of allowed
requests. But as said this does leak almost all of the vbox_vmmdev.h
hardware interface into the userspace API.
I checked and VBOXGUEST_IOCTL_VMMREQUEST is used in various places,
so I cannot just drop it.
> Do we need all of this both in the kernel and in the header that actually
> defines the kernel/user interface?
Yes, see above.
>> +/**
>> + * @name VBoxGuest IOCTL codes and structures.
>> + *
>> + * The range 0..15 is for basic driver communication.
>> + * The range 16..31 is for HGCM communication.
>> + * The range 32..47 is reserved for future use.
>> + * The range 48..63 is for OS specific communication.
>> + * The 7th bit is reserved for future hacks.
>> + * The 8th bit is reserved for distinguishing between 32-bit and 64-bit
>> + * processes in future 64-bit guest additions.
>> + * @{
>> + */
>> +#if __BITS_PER_LONG == 64
>> +#define VBOXGUEST_IOCTL_FLAG 128
>> +#else
>> +#define VBOXGUEST_IOCTL_FLAG 0
>> +#endif
>> +/** @} */
>
> This seems misguided, we already know the difference between
> 32-bit and 64-bit tasks based on the entry point, and if the
> interface is properly designed then we don't care about this
> difference at all.
I guess (and really guess at that) that this is there because on
some guest OS-es supported by vbox the distinction between a
32 bit or 64 bit app calling the ioctl cannot be made in another
way.
But I would not mind dropping this flag for linux.
Michael / Knut would something like this be acceptable ?
--- a/include/VBox/VBoxGuest.h
+++ b/include/VBox/VBoxGuest.h
@@ -123,9 +123,9 @@
* used.
* @{
*/
-#if defined(RT_ARCH_AMD64) || defined(RT_ARCH_SPARC64)
+#if (defined(RT_ARCH_AMD64) || defined(RT_ARCH_SPARC64)) && !defined(RT_OS_LINUX)
# define VBOXGUEST_IOCTL_FLAG 128
-#elif defined(RT_ARCH_X86) || defined(RT_ARCH_SPARC)
+#elif defined(RT_ARCH_X86) || defined(RT_ARCH_SPARC) || defined(RT_OS_LINUX)
# define VBOXGUEST_IOCTL_FLAG 0
#else
# error "dunno which arch this is!"
(extend with some other changes such as properly adding a .compat_ioctl
file-op ?
>> +#define VBOXGUEST_IOCTL_CODE_(function, size) \
>> + _IOC(_IOC_READ|_IOC_WRITE, 'V', (function), (size))
>> +#define VBOXGUEST_IOCTL_STRIP_SIZE(code) \
>> + VBOXGUEST_IOCTL_CODE_(_IOC_NR((code)), 0)
>> +
>> +#define VBOXGUEST_IOCTL_CODE(function, size) \
>> + VBOXGUEST_IOCTL_CODE_((function) | VBOXGUEST_IOCTL_FLAG, size)
>> +/* Define 32 bit codes to support 32 bit applications in 64 bit guest driver. */
>> +#define VBOXGUEST_IOCTL_CODE_32(function, size) \
>> +VBOXGUEST_IOCTL_CODE_(function, size)
>
> If the command numbers can be changed wihtout causing too many
> problems, I'd just do away with these wrappers and use _IOR/_IOW/_IORW
> as needed.
The upstream vboxguest.h header defines VBOXGUEST_IOCTL_CODE for many
different guest platforms. I guess not all of them have the
_IOR/_IOW/_IORW flags embedded in the ioctl-code concept and I believe
some of them only have a small amount of bits which can actually be
used for the code, so we cannot just cram these flags in there.
I think that instead we need an ioctl_direction_flags table which can
be used to determine if we need todo the copy_from_user and friends,
without changing the codes.
>
>> +/** Input and output buffers layout of the IOCTL_VBOXGUEST_WAITEVENT */
>> +typedef struct VBoxGuestWaitEventInfo {
>> + /** timeout in milliseconds */
>> + u32 u32TimeoutIn;
>> + /** events to wait for */
>> + u32 u32EventMaskIn;
>> + /** result code */
>> + u32 u32Result;
>> + /** events occurred */
>> + u32 u32EventFlagsOut;
>> +} VBoxGuestWaitEventInfo;
>> +VBOXGUEST_ASSERT_SIZE(VBoxGuestWaitEventInfo, 16);
>
> 'u32' is not an approprioate type for a kernel header, use '__u32'
> instead.
Huh I thought that u32 was preferred, but I guess that it is not allowed
in uapi headers due to potential conflicts and it should be __u32 in uapi
headers ?
Regards,
Hans
Powered by blists - more mailing lists