lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 8 Aug 2013 09:40:27 +0300
From:	"Michael S. Tsirkin" <mst@...hat.com>
To:	Sudeep Dutt <sudeep.dutt@...el.com>
Cc:	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Arnd Bergmann <arnd@...db.de>,
	Rusty Russell <rusty@...tcorp.com.au>,
	Rob Landley <rob@...dley.net>, linux-kernel@...r.kernel.org,
	virtualization@...ts.linux-foundation.org,
	linux-doc@...r.kernel.org, asias@...hat.com,
	Nikhil Rao <nikhil.rao@...el.com>,
	Ashutosh Dixit <ashutosh.dixit@...el.com>,
	Caz Yokoyama <Caz.Yokoyama@...el.com>,
	Dasaratharaman Chandramouli 
	<dasaratharaman.chandramouli@...el.com>,
	Harshavardhan R Kharche <harshavardhan.r.kharche@...el.com>,
	"Yaozu (Eddie) Dong" <eddie.dong@...el.com>,
	Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@...el.com>
Subject: Re: [PATCH v2 7/7] Sample Implementation of Intel MIC User Space
 Daemon.

On Wed, Aug 07, 2013 at 08:04:13PM -0700, Sudeep Dutt wrote:
> From: Caz Yokoyama <Caz.Yokoyama@...el.com>
> 
> This patch introduces a sample user space daemon which
> implements the virtio device backends on the host. The daemon
> creates/removes/configures virtio device backends by communicating with
> the Intel MIC Host Driver. The virtio devices currently supported are
> virtio net, virtio console and virtio block. Virtio net supports TSO/GSO.
> The daemon also monitors card shutdown status and takes appropriate actions
> like killing the virtio backends and resetting the card upon card shutdown
> and crashes.
> 
> Co-author: Ashutosh Dixit <ashutosh.dixit@...el.com>
> Co-author: Sudeep Dutt <sudeep.dutt@...el.com>
> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@...el.com>
> Signed-off-by: Caz Yokoyama <Caz.Yokoyama@...el.com>
> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@...el.com>
> Signed-off-by: Nikhil Rao <nikhil.rao@...el.com>
> Signed-off-by: Harshavardhan R Kharche <harshavardhan.r.kharche@...el.com>
> Signed-off-by: Sudeep Dutt <sudeep.dutt@...el.com>
> Acked-by: Yaozu (Eddie) Dong <eddie.dong@...el.com>
> ---
>  Documentation/mic/mic_overview.txt |   48 +
>  Documentation/mic/mpssd/.gitignore |    1 +
>  Documentation/mic/mpssd/Makefile   |   19 +
>  Documentation/mic/mpssd/micctrl    |  152 ++++
>  Documentation/mic/mpssd/mpss       |  245 ++++++
>  Documentation/mic/mpssd/mpssd.c    | 1689 ++++++++++++++++++++++++++++++++++++
>  Documentation/mic/mpssd/mpssd.h    |  100 +++
>  Documentation/mic/mpssd/sysfs.c    |  103 +++

Is this generally useful or just example code?
If the former, you can put it in tools/ as well.

>  8 files changed, 2357 insertions(+)
>  create mode 100644 Documentation/mic/mic_overview.txt
>  create mode 100644 Documentation/mic/mpssd/.gitignore
>  create mode 100644 Documentation/mic/mpssd/Makefile
>  create mode 100755 Documentation/mic/mpssd/micctrl
>  create mode 100755 Documentation/mic/mpssd/mpss
>  create mode 100644 Documentation/mic/mpssd/mpssd.c
>  create mode 100644 Documentation/mic/mpssd/mpssd.h
>  create mode 100644 Documentation/mic/mpssd/sysfs.c
> 
> diff --git a/Documentation/mic/mic_overview.txt b/Documentation/mic/mic_overview.txt
> new file mode 100644
> index 0000000..8b1a916
> --- /dev/null
> +++ b/Documentation/mic/mic_overview.txt
> @@ -0,0 +1,48 @@
> +An Intel MIC X100 device is a PCIe form factor add-in coprocessor
> +card based on the Intel Many Integrated Core (MIC) architecture
> +that runs a Linux OS. It is a PCIe endpoint in a platform and therefore
> +implements the three required standard address spaces i.e. configuration,
> +memory and I/O. The host OS loads a device driver as is typical for
> +PCIe devices. The card itself runs a bootstrap after reset that
> +transfers control to the card OS downloaded from the host driver.
> +The card OS as shipped by Intel is a Linux kernel with modifications
> +for the X100 devices.
> +
> +Since it is a PCIe card, it does not have the ability to host hardware
> +devices for networking, storage and console. We provide these devices
> +on X100 coprocessors thus enabling a self-bootable equivalent environment
> +for applications. A key benefit of our solution is that it leverages
> +the standard virtio framework for network, disk and console devices,
> +though in our case the virtio framework is used across a PCIe bus.
> +
> +Here is a block diagram of the various components described above. The
> +virtio backends are situated on the host rather than the card given better
> +single threaded performance for the host compared to MIC and the ability of
> +the host to initiate DMA's to/from the card using the MIC DMA engine.
> +
> +                              |
> +       +----------+           |             +----------+
> +       | Card OS  |           |             | Host OS  |
> +       +----------+           |             +----------+
> +                              |
> ++-------+ +--------+ +------+ | +---------+  +--------+ +--------+
> +| Virtio| |Virtio  | |Virtio| | |Virtio   |  |Virtio  | |Virtio  |
> +| Net   | |Console | |Block | | |Net      |  |Console | |Block   |
> +| Driver| |Driver  | |Driver| | |backend  |  |backend | |backend |
> ++-------+ +--------+ +------+ | +---------+  +--------+ +--------+
> +    |         |         |     |      |            |         |
> +    |         |         |     |Ring 3|            |         |
> +    |         |         |     |------|------------|---------|-------
> +    +-------------------+     |Ring 0+--------------------------+
> +              |               |      | Virtio over PCIe IOCTLs  |
> +              |               |      +--------------------------+
> +      +--------------+        |                   |
> +      |Intel MIC     |        |            +---------------+
> +      |Card Driver   |        |            |Intel MIC      |
> +      +--------------+        |            |Host Driver    |
> +              |               |            +---------------+
> +              |               |                   |
> +     +-------------------------------------------------------------+
> +     |                                                             |
> +     |                    PCIe Bus                                 |
> +     +-------------------------------------------------------------+
> diff --git a/Documentation/mic/mpssd/.gitignore b/Documentation/mic/mpssd/.gitignore
> new file mode 100644
> index 0000000..8b7c72f
> --- /dev/null
> +++ b/Documentation/mic/mpssd/.gitignore
> @@ -0,0 +1 @@
> +mpssd
> diff --git a/Documentation/mic/mpssd/Makefile b/Documentation/mic/mpssd/Makefile
> new file mode 100644
> index 0000000..eb860a7
> --- /dev/null
> +++ b/Documentation/mic/mpssd/Makefile
> @@ -0,0 +1,19 @@
> +#
> +# Makefile - Intel MIC User Space Tools.
> +# Copyright(c) 2013, Intel Corporation.
> +#
> +ifdef DEBUG
> +CFLAGS += $(USERWARNFLAGS) -I. -g -Wall -DDEBUG=$(DEBUG)
> +else
> +CFLAGS += $(USERWARNFLAGS) -I. -g -Wall
> +endif
> +
> +mpssd: mpssd.o sysfs.o
> +	$(CC) $(CFLAGS) -o $@ $^ -lpthread
> +
> +install:
> +	install mpssd /usr/sbin/mpssd
> +	install micctrl /usr/sbin/micctrl
> +
> +clean:
> +	rm -f mpssd *.o
> diff --git a/Documentation/mic/mpssd/micctrl b/Documentation/mic/mpssd/micctrl
> new file mode 100755
> index 0000000..e0cfa53
> --- /dev/null
> +++ b/Documentation/mic/mpssd/micctrl
> @@ -0,0 +1,152 @@
> +#!/bin/bash
> +# Intel MIC Platform Software Stack (MPSS)
> +#
> +# Copyright(c) 2013 Intel Corporation.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License, version 2, as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it will be useful, but
> +# WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> +# General Public License for more details.
> +#
> +# The full GNU General Public License is included in this distribution in
> +# the file called "COPYING".
> +#
> +# Intel MIC User Space Tools.
> +#
> +# micctrl - Controls MIC boot/start/stop.
> +#
> +# chkconfig: 2345 95 05
> +# description: start MPSS stack processing.
> +#
> +### BEGIN INIT INFO
> +# Provides: micctrl
> +### END INIT INFO
> +
> +# Source function library.
> +. /etc/init.d/functions
> +
> +sysfs="/sys/class/mic"
> +
> +status()
> +{
> +	if [ "`echo $1 | head -c3`" == "mic" ]; then
> +		f=$sysfs/$1
> +		echo -e $1 state: "`cat $f/state`" shutdown_status: "`cat $f/shutdown_status`"
> +		return 0
> +	fi
> +
> +	if [ -d "$sysfs" ]; then
> +		for f in $sysfs/*
> +		do
> +			echo -e ""`basename $f`" state: "`cat $f/state`" shutdown_status: "`cat $f/shutdown_status`""
> +		done
> +	fi
> +
> +	return 0
> +}
> +
> +reset()
> +{
> +	if [ "`echo $1 | head -c3`" == "mic" ]; then
> +		f=$sysfs/$1
> +		echo reset > $f/state
> +		return 0
> +	fi
> +
> +	if [ -d "$sysfs" ]; then
> +		for f in $sysfs/*
> +		do
> +			echo reset > $f/state
> +		done
> +	fi
> +
> +	return 0
> +}
> +
> +boot()
> +{
> +	if [ "`echo $1 | head -c3`" == "mic" ]; then
> +		f=$sysfs/$1
> +		echo "boot:linux:mic/uos.img:mic/$1.image" > $f/state
> +		return 0
> +	fi
> +
> +	if [ -d "$sysfs" ]; then
> +		for f in $sysfs/*
> +		do
> +			echo "boot:linux:mic/uos.img:mic/`basename $f`.image" > $f/state
> +		done
> +	fi
> +
> +	return 0
> +}
> +
> +shutdown()
> +{
> +	if [ "`echo $1 | head -c3`" == "mic" ]; then
> +		f=$sysfs/$1
> +		echo shutdown > $f/state
> +		return 0
> +	fi
> +
> +	if [ -d "$sysfs" ]; then
> +		for f in $sysfs/*
> +		do
> +			echo shutdown > $f/state
> +		done
> +	fi
> +
> +	return 0
> +}
> +
> +wait()
> +{
> +	if [ "`echo $1 | head -c3`" == "mic" ]; then
> +		f=$sysfs/$1
> +		while [ "`cat $f/state`" != "offline" -a "`cat $f/state`" != "online" ]
> +		do
> +			sleep 1
> +			echo -e "Waiting for $1 to go offline"
> +		done
> +		return 0
> +	fi
> +
> +	if [ -d "$sysfs" ]; then
> +		# Wait for the cards to go offline
> +		for f in $sysfs/*
> +		do
> +			while [ "`cat $f/state`" != "offline" -a "`cat $f/state`" != "online" ]
> +			do
> +				sleep 1
> +				echo -e "Waiting for "`basename $f`" to go offline"
> +			done
> +		done
> +	fi
> +}
> +
> +case $1 in
> +	-s)
> +		status $2
> +		;;
> +	-r)
> +		reset $2
> +		;;
> +	-b)
> +		boot $2
> +		;;
> +	-S)
> +		shutdown $2
> +		;;
> +	-w)
> +		wait $2
> +		;;
> +	*)
> +		echo $"Usage: $0 {-s (status) |-r (reset) |-b (boot) |-S (shutdown) |-w (wait)}"
> +		exit 2
> +esac
> +
> +exit $?
> diff --git a/Documentation/mic/mpssd/mpss b/Documentation/mic/mpssd/mpss
> new file mode 100755
> index 0000000..f0bb3dd
> --- /dev/null
> +++ b/Documentation/mic/mpssd/mpss
> @@ -0,0 +1,245 @@
> +#!/bin/bash
> +# Intel MIC Platform Software Stack (MPSS)
> +#
> +# Copyright(c) 2013 Intel Corporation.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License, version 2, as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it will be useful, but
> +# WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> +# General Public License for more details.
> +#
> +# The full GNU General Public License is included in this distribution in
> +# the file called "COPYING".
> +#
> +# Intel MIC User Space Tools.
> +#
> +# mpss	Start mpssd.
> +#
> +# chkconfig: 2345 95 05
> +# description: start MPSS stack processing.
> +#
> +### BEGIN INIT INFO
> +# Provides: mpss
> +# Required-Start:
> +# Required-Stop:
> +# Short-Description: MPSS stack control
> +# Description: MPSS stack control
> +### END INIT INFO
> +
> +# Source function library.
> +. /etc/init.d/functions
> +
> +exec=/usr/sbin/mpssd
> +sysfs="/sys/class/mic"
> +
> +start()
> +{
> +	[ -x $exec ] || exit 5
> +
> +	echo -e $"Starting MPSS Stack"
> +
> +	echo -e $"Loading MIC_HOST Module"
> +
> +	# Ensure the driver is loaded
> +	[ -d "$sysfs" ] || modprobe mic_host
> +
> +	if [ "`ps -e | awk '{print $4}' | grep mpssd | head -1`" = "mpssd" ]; then
> +		echo -e $"MPSSD already running! "
> +		success
> +		echo
> +		return 0;
> +	fi
> +
> +	# Start the daemon
> +	echo -n $"Starting MPSSD"
> +	$exec &
> +	RETVAL=$?
> +	if [ $RETVAL -ne 0 ]; then
> +		failure
> +	else
> +		success
> +	fi
> +	echo
> +
> +	sleep 5
> +
> +	# Boot the cards
> +	if [ $RETVAL -eq 0 ]; then
> +		for f in $sysfs/*
> +		do
> +			echo -ne "Booting "`basename $f`" "
> +			echo "boot:linux:mic/uos.img:mic/`basename $f`.image" > $f/state
> +			RETVAL=$?
> +			if [ $RETVAL -ne 0 ]; then
> +				failure
> +			else
> +				success
> +			fi
> +			echo
> +		done
> +	fi
> +
> +	# Wait till ping works
> +	if [ $RETVAL -eq 0 ]; then
> +		for f in $sysfs/*
> +		do
> +			count=100
> +			ipaddr=`cat $f/cmdline`
> +			ipaddr=${ipaddr#*address,}
> +			ipaddr=`echo $ipaddr | cut -d, -f1 | cut -d\; -f1`
> +
> +			while [ $count -ge 0 ]
> +			do
> +				echo -e "Pinging "`basename $f`" "
> +				ping -c 1 $ipaddr &> /dev/null
> +				RETVAL=$?
> +				if [ $RETVAL -eq 0 ]; then
> +					success
> +					break
> +				fi
> +				sleep 1
> +				count=`expr $count - 1`
> +			done
> +			if [ $RETVAL -ne 0 ]; then
> +				failure
> +			else
> +				success
> +			fi
> +			echo
> +		done
> +	fi
> +	return $RETVAL
> +}
> +
> +stop()
> +{
> +	echo -e $"Shutting down MPSS Stack: "
> +
> +	# Bail out if module is unloaded
> +	if [ ! -d "$sysfs" ]; then
> +		echo -n $"Module unloaded "
> +		killall -9 mpssd 2>/dev/null
> +		success
> +		echo
> +		return 0
> +	fi
> +
> +	# Shut down the cards
> +	for f in $sysfs/*
> +	do
> +		echo -e "Shutting down `basename $f` "
> +		echo "shutdown" > $f/state 2>/dev/null
> +	done
> +
> +	# Wait for the cards to go offline
> +	for f in $sysfs/*
> +	do
> +		while [ "`cat $f/state`" != "offline" ]
> +		do
> +			sleep 1
> +			echo -e "Waiting for "`basename $f`" to go offline"
> +		done
> +	done
> +
> +	# Display the status of the cards
> +	for f in $sysfs/*
> +	do
> +		echo -e ""`basename $f`" state: "`cat $f/state`""
> +	done
> +
> +	sleep 5
> +
> +	# Kill MPSSD now
> +	echo -n $"Killing MPSSD"
> +	killall -9 mpssd 2>/dev/null
> +	RETVAL=$?
> +	if [ $RETVAL -ne 0 ]; then
> +		failure
> +	else
> +		success
> +	fi
> +	echo
> +	return $RETVAL
> +}
> +
> +restart()
> +{
> +	stop
> +	sleep 5
> +	start
> +}
> +
> +status()
> +{
> +	if [ -d "$sysfs" ]; then
> +		for f in $sysfs/*
> +		do
> +			echo -e ""`basename $f`" state: "`cat $f/state`""
> +		done
> +	fi
> +
> +	if [ "`ps -e | awk '{print $4}' | grep mpssd | head -n 1`" = "mpssd" ]; then
> +		echo "mpssd is running"
> +	else
> +		echo "mpssd is stopped"
> +	fi
> +	return 0
> +}
> +
> +unload()
> +{
> +	if [ ! -d "$sysfs" ]; then
> +		echo -n $"No MIC_HOST Module: "
> +		killall -9 mpssd 2>/dev/null
> +		success
> +		echo
> +		return
> +	fi
> +
> +	stop
> +	RETVAL=$?
> +
> +	sleep 5
> +	echo -n $"Removing MIC_HOST Module: "
> +
> +	if [ $RETVAL = 0 ]; then
> +		sleep 1
> +		modprobe -r mic_host
> +		RETVAL=$?
> +	fi
> +
> +	if [ $RETVAL -ne 0 ]; then
> +		failure
> +	else
> +		success
> +	fi
> +	echo
> +	return $RETVAL
> +}
> +
> +case $1 in
> +	start)
> +		start
> +		;;
> +	stop)
> +		stop
> +		;;
> +	restart)
> +		restart
> +		;;
> +	status)
> +		status
> +		;;
> +	unload)
> +		unload
> +		;;
> +	*)
> +		echo $"Usage: $0 {start|stop|restart|status|unload}"
> +		exit 2
> +esac
> +
> +exit $?
> diff --git a/Documentation/mic/mpssd/mpssd.c b/Documentation/mic/mpssd/mpssd.c
> new file mode 100644
> index 0000000..3bc34cb
> --- /dev/null
> +++ b/Documentation/mic/mpssd/mpssd.c
> @@ -0,0 +1,1689 @@
> +/*
> + * Intel MIC Platform Software Stack (MPSS)
> + *
> + * Copyright(c) 2013 Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * General Public License for more details.
> + *
> + * The full GNU General Public License is included in this distribution in
> + * the file called "COPYING".
> + *
> + * Intel MIC User Space Tools.
> + */
> +
> +#define _GNU_SOURCE
> +
> +#include <stdlib.h>
> +#include <fcntl.h>
> +#include <getopt.h>
> +#include <assert.h>
> +#include <unistd.h>
> +#include <stdbool.h>
> +#include <signal.h>
> +#include <poll.h>
> +#include <features.h>
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +#include <sys/mman.h>
> +#include <sys/socket.h>
> +#include <linux/virtio_ring.h>
> +#include <linux/virtio_net.h>
> +#include <linux/virtio_console.h>
> +#include <linux/virtio_blk.h>
> +#include <linux/version.h>
> +#include "mpssd.h"
> +#include <linux/mic_ioctl.h>
> +#include <linux/mic_common.h>
> +
> +static void init_mic(struct mic_info *mic);
> +
> +static FILE *logfp;
> +static struct mic_info mic_list;
> +
> +#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
> +
> +#define min_t(type, x, y) ({				\
> +		type __min1 = (x);                      \
> +		type __min2 = (y);                      \
> +		__min1 < __min2 ? __min1 : __min2; })
> +
> +/* align addr on a size boundary - adjust address up/down if needed */
> +#define _ALIGN_UP(addr, size)    (((addr)+((size)-1))&(~((size)-1)))
> +#define _ALIGN_DOWN(addr, size)  ((addr)&(~((size)-1)))
> +
> +/* align addr on a size boundary - adjust address up if needed */
> +#define _ALIGN(addr, size)     _ALIGN_UP(addr, size)
> +
> +/* to align the pointer to the (next) page boundary */
> +#define PAGE_ALIGN(addr)        _ALIGN(addr, PAGE_SIZE)
> +
> +#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
> +
> +/* Insert REP NOP (PAUSE) in busy-wait loops. */
> +static inline void cpu_relax(void)
> +{
> +	asm volatile("rep; nop" : : : "memory");
> +}
> +
> +#define GSO_ENABLED		1
> +#define MAX_GSO_SIZE		(64 * 1024)
> +#define ETH_H_LEN		14
> +#define MAX_NET_PKT_SIZE	(_ALIGN_UP(MAX_GSO_SIZE + ETH_H_LEN, 64))
> +#define MIC_DEVICE_PAGE_END	0x1000
> +
> +#ifndef VIRTIO_NET_HDR_F_DATA_VALID
> +#define VIRTIO_NET_HDR_F_DATA_VALID	2	/* Csum is valid */
> +#endif
> +
> +static struct {
> +	struct mic_device_desc dd;
> +	struct mic_vqconfig vqconfig[2];
> +	__u32 host_features, guest_acknowledgements;
> +	struct virtio_console_config cons_config;
> +} virtcons_dev_page = {
> +	.dd = {
> +		.type = VIRTIO_ID_CONSOLE,
> +		.num_vq = ARRAY_SIZE(virtcons_dev_page.vqconfig),
> +		.feature_len = sizeof(virtcons_dev_page.host_features),
> +		.config_len = sizeof(virtcons_dev_page.cons_config),
> +	},
> +	.vqconfig[0] = {
> +		.num = htole16(MIC_VRING_ENTRIES),
> +	},
> +	.vqconfig[1] = {
> +		.num = htole16(MIC_VRING_ENTRIES),
> +	},
> +};
> +
> +static struct {
> +	struct mic_device_desc dd;
> +	struct mic_vqconfig vqconfig[2];
> +	__u32 host_features, guest_acknowledgements;
> +	struct virtio_net_config net_config;
> +} virtnet_dev_page = {
> +	.dd = {
> +		.type = VIRTIO_ID_NET,
> +		.num_vq = ARRAY_SIZE(virtnet_dev_page.vqconfig),
> +		.feature_len = sizeof(virtnet_dev_page.host_features),
> +		.config_len = sizeof(virtnet_dev_page.net_config),
> +	},
> +	.vqconfig[0] = {
> +		.num = htole16(MIC_VRING_ENTRIES),
> +	},
> +	.vqconfig[1] = {
> +		.num = htole16(MIC_VRING_ENTRIES),
> +	},
> +#if GSO_ENABLED
> +		.host_features = htole32(
> +		1 << VIRTIO_NET_F_CSUM |
> +		1 << VIRTIO_NET_F_GSO |
> +		1 << VIRTIO_NET_F_GUEST_TSO4 |
> +		1 << VIRTIO_NET_F_GUEST_TSO6 |
> +		1 << VIRTIO_NET_F_GUEST_ECN |
> +		1 << VIRTIO_NET_F_GUEST_UFO),
> +#else
> +		.host_features = 0,
> +#endif
> +};
> +
> +static const char *mic_config_dir = "/etc/sysconfig/mic";
> +static const char *virtblk_backend = "VIRTBLK_BACKEND";
> +static struct {
> +	struct mic_device_desc dd;
> +	struct mic_vqconfig vqconfig[1];
> +	__u32 host_features, guest_acknowledgements;
> +	struct virtio_blk_config blk_config;
> +} virtblk_dev_page = {
> +	.dd = {
> +		.type = VIRTIO_ID_BLOCK,
> +		.num_vq = ARRAY_SIZE(virtblk_dev_page.vqconfig),
> +		.feature_len = sizeof(virtblk_dev_page.host_features),
> +		.config_len = sizeof(virtblk_dev_page.blk_config),
> +	},
> +	.vqconfig[0] = {
> +		.num = htole16(MIC_VRING_ENTRIES),
> +	},
> +	.host_features =
> +		htole32(1<<VIRTIO_BLK_F_SEG_MAX),
> +	.blk_config = {
> +		.seg_max = htole32(MIC_VRING_ENTRIES - 2),
> +		.capacity = htole64(0),
> +	 }
> +};
> +
> +static char *myname;
> +
> +static int
> +tap_configure(struct mic_info *mic, char *dev)
> +{
> +	pid_t pid;
> +	char *ifargv[7];
> +	char ipaddr[IFNAMSIZ];
> +	int ret = 0;
> +
> +	pid = fork();
> +	if (pid == 0) {
> +		ifargv[0] = "ip";
> +		ifargv[1] = "link";
> +		ifargv[2] = "set";
> +		ifargv[3] = dev;
> +		ifargv[4] = "up";
> +		ifargv[5] = NULL;
> +		mpsslog("Configuring %s\n", dev);
> +		ret = execvp("ip", ifargv);
> +		if (ret < 0) {
> +			mpsslog("%s execvp failed errno %s\n",
> +				mic->name, strerror(errno));
> +			return ret;
> +		}
> +	}
> +	if (pid < 0) {
> +		mpsslog("%s fork failed errno %s\n",
> +			mic->name, strerror(errno));
> +		return ret;
> +	}
> +
> +	ret = waitpid(pid, NULL, 0);
> +	if (ret < 0) {
> +		mpsslog("%s waitpid failed errno %s\n",
> +			mic->name, strerror(errno));
> +		return ret;
> +	}
> +
> +	snprintf(ipaddr, IFNAMSIZ, "172.31.%d.254/24", mic->id);
> +
> +	pid = fork();
> +	if (pid == 0) {
> +		ifargv[0] = "ip";
> +		ifargv[1] = "addr";
> +		ifargv[2] = "add";
> +		ifargv[3] = ipaddr;
> +		ifargv[4] = "dev";
> +		ifargv[5] = dev;
> +		ifargv[6] = NULL;
> +		mpsslog("Configuring %s ipaddr %s\n", dev, ipaddr);
> +		ret = execvp("ip", ifargv);
> +		if (ret < 0) {
> +			mpsslog("%s execvp failed errno %s\n",
> +				mic->name, strerror(errno));
> +			return ret;
> +		}
> +	}
> +	if (pid < 0) {
> +		mpsslog("%s fork failed errno %s\n",
> +			mic->name, strerror(errno));
> +		return ret;
> +	}
> +
> +	ret = waitpid(pid, NULL, 0);
> +	if (ret < 0) {
> +		mpsslog("%s waitpid failed errno %s\n",
> +			mic->name, strerror(errno));
> +		return ret;
> +	}
> +	mpsslog("MIC name %s %s %d DONE!\n",
> +		mic->name, __func__, __LINE__);
> +	return 0;
> +}
> +
> +static int tun_alloc(struct mic_info *mic, char *dev)
> +{
> +	struct ifreq ifr;
> +	int fd, err;
> +#if GSO_ENABLED
> +	unsigned offload;
> +#endif
> +	fd = open("/dev/net/tun", O_RDWR);
> +	if (fd < 0) {
> +		mpsslog("Could not open /dev/net/tun %s\n", strerror(errno));
> +		goto done;
> +	}
> +
> +	memset(&ifr, 0, sizeof(ifr));
> +
> +	ifr.ifr_flags = IFF_TAP | IFF_NO_PI | IFF_VNET_HDR;
> +	if (*dev)
> +		strncpy(ifr.ifr_name, dev, IFNAMSIZ);
> +
> +	err = ioctl(fd, TUNSETIFF, (void *) &ifr);
> +	if (err < 0) {
> +		mpsslog("%s %s %d TUNSETIFF failed %s\n",
> +			mic->name, __func__, __LINE__, strerror(errno));
> +		close(fd);
> +		return err;
> +	}
> +#if GSO_ENABLED
> +	offload = TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 |
> +		TUN_F_TSO_ECN | TUN_F_UFO;
> +
> +	err = ioctl(fd, TUNSETOFFLOAD, offload);
> +	if (err < 0) {
> +		mpsslog("%s %s %d TUNSETOFFLOAD failed %s\n",
> +			mic->name, __func__, __LINE__, strerror(errno));
> +		close(fd);
> +		return err;
> +	}
> +#endif
> +	strcpy(dev, ifr.ifr_name);
> +	mpsslog("Created TAP %s\n", dev);
> +done:
> +	return fd;
> +}
> +
> +#define NET_FD_VIRTIO_NET 0
> +#define NET_FD_TUN 1
> +#define MAX_NET_FD 2
> +
> +static void * *
> +get_dp(struct mic_info *mic, int type)
> +{
> +	switch (type) {
> +	case VIRTIO_ID_CONSOLE:
> +		return &mic->mic_console.console_dp;
> +	case VIRTIO_ID_NET:
> +		return &mic->mic_net.net_dp;
> +	case VIRTIO_ID_BLOCK:
> +		return &mic->mic_virtblk.block_dp;
> +	}
> +	mpsslog("%s %s %d not found\n", mic->name, __func__, type);
> +	assert(0);
> +	return NULL;
> +}
> +
> +static struct mic_device_desc *get_device_desc(struct mic_info *mic, int type)
> +{
> +	struct mic_device_desc *d;
> +	int i;
> +	void *dp = *get_dp(mic, type);
> +
> +	for (i = mic_aligned_size(struct mic_bootparam); i < PAGE_SIZE;
> +		i += mic_total_desc_size(d)) {
> +		d = dp + i;
> +
> +		/* End of list */
> +		if (d->type == 0)
> +			break;
> +
> +		if (d->type == -1)
> +			continue;
> +
> +		mpsslog("%s %s d-> type %d d %p\n",
> +			mic->name, __func__, d->type, d);
> +
> +		if (d->type == (__u8)type)
> +			return d;
> +	}
> +	mpsslog("%s %s %d not found\n", mic->name, __func__, type);
> +	assert(0);
> +	return NULL;
> +}
> +
> +/* See comments in vhost.c for explanation of next_desc() */
> +static unsigned next_desc(struct vring_desc *desc)
> +{
> +	unsigned int next;
> +
> +	if (!(le16toh(desc->flags) & VRING_DESC_F_NEXT))
> +		return -1U;
> +	next = le16toh(desc->next);
> +	return next;
> +}
> +
> +/* Sum up all the IOVEC length */
> +static ssize_t
> +sum_iovec_len(struct mic_copy_desc *copy)
> +{
> +	ssize_t sum = 0;
> +	int i;
> +
> +	for (i = 0; i < copy->iovcnt; i++)
> +		sum += copy->iov[i].iov_len;
> +	return sum;
> +}
> +
> +static inline void verify_out_len(struct mic_info *mic,
> +	struct mic_copy_desc *copy)
> +{
> +	if (copy->out_len != sum_iovec_len(copy)) {
> +		mpsslog("%s %s %d BUG copy->out_len 0x%x len 0x%x\n",
> +				mic->name, __func__, __LINE__,
> +				copy->out_len, sum_iovec_len(copy));
> +		assert(copy->out_len == sum_iovec_len(copy));
> +	}
> +}
> +
> +/* Display an iovec */
> +static void
> +disp_iovec(struct mic_info *mic, struct mic_copy_desc *copy,
> +	const char *s, int line)
> +{
> +	int i;
> +
> +	for (i = 0; i < copy->iovcnt; i++)
> +		mpsslog("%s %s %d copy->iov[%d] addr %p len 0x%lx\n",
> +			mic->name, s, line, i,
> +			copy->iov[i].iov_base, copy->iov[i].iov_len);
> +}
> +
> +static inline __u16 read_avail_idx(struct mic_vring *vr)
> +{
> +	return ACCESS_ONCE(vr->info->avail_idx);
> +}
> +
> +static inline void txrx_prepare(int type, bool tx, struct mic_vring *vr,
> +				struct mic_copy_desc *copy, ssize_t len)
> +{
> +	copy->vr_idx = tx ? 0 : 1;
> +	copy->update_used = true;
> +	if (type == VIRTIO_ID_NET)
> +		copy->iov[1].iov_len = len - sizeof(struct virtio_net_hdr);
> +	else
> +		copy->iov[0].iov_len = len;
> +}
> +
> +/* Central API which triggers the copies */
> +static int
> +mic_virtio_copy(struct mic_info *mic, int fd,
> +	struct mic_vring *vr, struct mic_copy_desc *copy)
> +{
> +	int ret;
> +
> +	ret = ioctl(fd, MIC_VIRTIO_COPY_DESC, copy);
> +	if (ret) {
> +		mpsslog("%s %s %d errno %s ret %d\n",
> +			mic->name, __func__, __LINE__,
> +			strerror(errno), ret);
> +	}
> +	return ret;
> +}
> +
> +/*
> + * This initialization routine requires at least one
> + * vring i.e. vr0. vr1 is optional.
> + */
> +static void *
> +init_vr(struct mic_info *mic, int fd, int type,
> +	struct mic_vring *vr0, struct mic_vring *vr1, int num_vq)
> +{
> +	int vr_size;
> +	char *va;
> +
> +	vr_size = PAGE_ALIGN(vring_size(MIC_VRING_ENTRIES,
> +		MIC_VIRTIO_RING_ALIGN) + sizeof(struct _mic_vring_info));
> +	va = mmap(NULL, MIC_DEVICE_PAGE_END + vr_size * num_vq,
> +		PROT_READ, MAP_SHARED, fd, 0);
> +	if (MAP_FAILED == va) {
> +		mpsslog("%s %s %d mmap failed errno %s\n",
> +			mic->name, __func__, __LINE__,
> +			strerror(errno));
> +		goto done;
> +	}
> +	*get_dp(mic, type) = (void *)va;
> +	vr0->va = (struct mic_vring *)&va[MIC_DEVICE_PAGE_END];
> +	vr0->info = vr0->va +
> +		vring_size(MIC_VRING_ENTRIES, MIC_VIRTIO_RING_ALIGN);
> +	vring_init(&vr0->vr,
> +		MIC_VRING_ENTRIES, vr0->va, MIC_VIRTIO_RING_ALIGN);
> +	mpsslog("%s %s vr0 %p vr0->info %p vr_size 0x%x vring 0x%x ",
> +		__func__, mic->name, vr0->va, vr0->info, vr_size,
> +		vring_size(MIC_VRING_ENTRIES, MIC_VIRTIO_RING_ALIGN));
> +	mpsslog("magic 0x%x expected 0x%x\n",
> +		vr0->info->magic, MIC_MAGIC + type + 0);
> +	assert(vr0->info->magic == MIC_MAGIC + type + 0);
> +	if (vr1) {
> +		vr1->va = (struct mic_vring *)
> +			&va[MIC_DEVICE_PAGE_END + vr_size];
> +		vr1->info = vr1->va + vring_size(MIC_VRING_ENTRIES,
> +			MIC_VIRTIO_RING_ALIGN);
> +		vring_init(&vr1->vr,
> +			MIC_VRING_ENTRIES, vr1->va, MIC_VIRTIO_RING_ALIGN);
> +		mpsslog("%s %s vr1 %p vr1->info %p vr_size 0x%x vring 0x%x ",
> +			__func__, mic->name, vr1->va, vr1->info, vr_size,
> +			vring_size(MIC_VRING_ENTRIES, MIC_VIRTIO_RING_ALIGN));
> +		mpsslog("magic 0x%x expected 0x%x\n",
> +			vr1->info->magic, MIC_MAGIC + type + 1);
> +		assert(vr1->info->magic == MIC_MAGIC + type + 1);
> +	}
> +done:
> +	return va;
> +}
> +
> +static void
> +uninit_vr(struct mic_info *mic, int num_vq)
> +{
> +	int vr_size, ret;
> +
> +	vr_size = PAGE_ALIGN(vring_size(MIC_VRING_ENTRIES,
> +		MIC_VIRTIO_RING_ALIGN) + sizeof(struct _mic_vring_info));
> +	ret = munmap(mic->mic_virtblk.block_dp,
> +		MIC_DEVICE_PAGE_END + vr_size * num_vq);
> +	if (ret < 0)
> +		mpsslog("%s munmap errno %d\n", mic->name, errno);
> +}
> +
> +static void
> +wait_for_card_driver(struct mic_info *mic, int fd, int type)
> +{
> +	struct pollfd pollfd;
> +	int err;
> +	struct mic_device_desc *desc = get_device_desc(mic, type);
> +
> +	pollfd.fd = fd;
> +	mpsslog("%s %s Waiting .... desc-> type %d status 0x%x\n",
> +		mic->name, __func__, type, desc->status);
> +	while (1) {
> +		pollfd.events = POLLIN;
> +		pollfd.revents = 0;
> +		err = poll(&pollfd, 1, -1);
> +		if (err < 0) {
> +			mpsslog("%s %s poll failed %s\n",
> +				mic->name, __func__, strerror(errno));
> +			continue;
> +		}
> +
> +		if (pollfd.revents) {
> +			mpsslog("%s %s Waiting... desc-> type %d status 0x%x\n",
> +				mic->name, __func__, type, desc->status);
> +			if (desc->status & VIRTIO_CONFIG_S_DRIVER_OK) {
> +				mpsslog("%s %s poll.revents %d\n",
> +					mic->name, __func__, pollfd.revents);
> +				mpsslog("%s %s desc-> type %d status 0x%x\n",
> +					mic->name, __func__, type,
> +					desc->status);
> +				break;
> +			}
> +		}
> +	}
> +}
> +
> +/* Spin till we have some descriptors */
> +static void
> +wait_for_descriptors(struct mic_info *mic, struct mic_vring *vr)
> +{
> +	__u16 avail_idx = read_avail_idx(vr);
> +
> +	while (avail_idx == le16toh(ACCESS_ONCE(vr->vr.avail->idx))) {
> +#ifdef DEBUG
> +		mpsslog("%s %s waiting for desc avail %d info_avail %d\n",
> +			mic->name, __func__,
> +			le16toh(vr->vr.avail->idx), vr->info->avail_idx);
> +#endif
> +		cpu_relax();
> +	}
> +}
> +
> +static void *
> +virtio_net(void *arg)
> +{
> +	static __u8 vnet_hdr[2][sizeof(struct virtio_net_hdr)];
> +	static __u8 vnet_buf[2][MAX_NET_PKT_SIZE] __aligned(64);
> +	struct iovec vnet_iov[2][2] = {
> +		{ { .iov_base = vnet_hdr[0], .iov_len = sizeof(vnet_hdr[0]) },
> +		  { .iov_base = vnet_buf[0], .iov_len = sizeof(vnet_buf[0]) } },
> +		{ { .iov_base = vnet_hdr[1], .iov_len = sizeof(vnet_hdr[1]) },
> +		  { .iov_base = vnet_buf[1], .iov_len = sizeof(vnet_buf[1]) } },
> +	};
> +	struct iovec *iov0 = vnet_iov[0], *iov1 = vnet_iov[1];
> +	struct mic_info *mic = (struct mic_info *)arg;
> +	char if_name[IFNAMSIZ];
> +	struct pollfd net_poll[MAX_NET_FD];
> +	struct mic_vring tx_vr, rx_vr;
> +	struct mic_copy_desc copy;
> +	struct mic_device_desc *desc;
> +	int err;
> +
> +	snprintf(if_name, IFNAMSIZ, "mic%d", mic->id);
> +	mic->mic_net.tap_fd = tun_alloc(mic, if_name);
> +	if (mic->mic_net.tap_fd < 0)
> +		goto done;
> +
> +	if (tap_configure(mic, if_name))
> +		goto done;
> +	mpsslog("MIC name %s id %d\n", mic->name, mic->id);
> +
> +	net_poll[NET_FD_VIRTIO_NET].fd = mic->mic_net.virtio_net_fd;
> +	net_poll[NET_FD_VIRTIO_NET].events = POLLIN;
> +	net_poll[NET_FD_TUN].fd = mic->mic_net.tap_fd;
> +	net_poll[NET_FD_TUN].events = POLLIN;
> +
> +	if (MAP_FAILED == init_vr(mic, mic->mic_net.virtio_net_fd,
> +		VIRTIO_ID_NET, &tx_vr, &rx_vr,
> +		virtnet_dev_page.dd.num_vq)) {
> +		mpsslog("%s init_vr failed %s\n",
> +			mic->name, strerror(errno));
> +		goto done;
> +	}
> +
> +	copy.iovcnt = 2;
> +	desc = get_device_desc(mic, VIRTIO_ID_NET);
> +
> +	while (1) {
> +		ssize_t len;
> +
> +		net_poll[NET_FD_VIRTIO_NET].revents = 0;
> +		net_poll[NET_FD_TUN].revents = 0;
> +
> +		/* Start polling for data from tap and virtio net */
> +		err = poll(net_poll, 2, -1);
> +		if (err < 0) {
> +			mpsslog("%s poll failed %s\n",
> +				__func__, strerror(errno));
> +			continue;
> +		}
> +		if (!(desc->status & VIRTIO_CONFIG_S_DRIVER_OK))
> +			wait_for_card_driver(mic, mic->mic_net.virtio_net_fd,
> +					VIRTIO_ID_NET);
> +		/*
> +		 * Check if there is data to be read from TUN and write to
> +		 * virtio net fd if there is.
> +		 */
> +		if (net_poll[NET_FD_TUN].revents & POLLIN) {
> +			copy.iov = iov0;
> +			len = readv(net_poll[NET_FD_TUN].fd,
> +				copy.iov, copy.iovcnt);
> +			if (len > 0) {
> +				struct virtio_net_hdr *hdr
> +					= (struct virtio_net_hdr *) vnet_hdr[0];
> +
> +				/* Disable checksums on the card since we are on
> +				   a reliable PCIe link */
> +				hdr->flags |= VIRTIO_NET_HDR_F_DATA_VALID;
> +#ifdef DEBUG
> +				mpsslog("%s %s %d hdr->flags 0x%x ", mic->name,
> +					__func__, __LINE__, hdr->flags);
> +				mpsslog("copy.out_len %d hdr->gso_type 0x%x\n",
> +					copy.out_len, hdr->gso_type);
> +#endif
> +#ifdef DEBUG
> +				disp_iovec(mic, copy, __func__, __LINE__);
> +				mpsslog("%s %s %d read from tap 0x%lx\n",
> +					mic->name, __func__, __LINE__,
> +					len);
> +#endif
> +				wait_for_descriptors(mic, &tx_vr);
> +				txrx_prepare(VIRTIO_ID_NET, 1, &tx_vr, &copy,
> +					len);
> +
> +				err = mic_virtio_copy(mic,
> +					mic->mic_net.virtio_net_fd, &tx_vr,
> +					&copy);
> +				if (err < 0) {
> +					mpsslog("%s %s %d mic_virtio_copy %s\n",
> +						mic->name, __func__, __LINE__,
> +						strerror(errno));
> +				}
> +				if (!err)
> +					verify_out_len(mic, &copy);
> +#ifdef DEBUG
> +				disp_iovec(mic, copy, __func__, __LINE__);
> +				mpsslog("%s %s %d wrote to net 0x%lx\n",
> +					mic->name, __func__, __LINE__,
> +					sum_iovec_len(&copy));
> +#endif
> +				/* Reinitialize IOV for next run */
> +				iov0[1].iov_len = MAX_NET_PKT_SIZE;
> +			} else if (len < 0) {
> +				disp_iovec(mic, &copy, __func__, __LINE__);
> +				mpsslog("%s %s %d read failed %s ", mic->name,
> +					__func__, __LINE__, strerror(errno));
> +				mpsslog("cnt %d sum %d\n",
> +					copy.iovcnt, sum_iovec_len(&copy));
> +			}
> +		}
> +
> +		/*
> +		 * Check if there is data to be read from virtio net and
> +		 * write to TUN if there is.
> +		 */
> +		if (net_poll[NET_FD_VIRTIO_NET].revents & POLLIN) {
> +			while (rx_vr.info->avail_idx !=
> +				le16toh(rx_vr.vr.avail->idx)) {
> +				copy.iov = iov1;
> +				txrx_prepare(VIRTIO_ID_NET, 0, &rx_vr, &copy,
> +					MAX_NET_PKT_SIZE
> +					+ sizeof(struct virtio_net_hdr));
> +
> +				err = mic_virtio_copy(mic,
> +					mic->mic_net.virtio_net_fd, &rx_vr,
> +					&copy);
> +				if (!err) {
> +#ifdef DEBUG
> +					struct virtio_net_hdr *hdr
> +						= (struct virtio_net_hdr *)
> +							vnet_hdr[1];
> +
> +					mpsslog("%s %s %d hdr->flags 0x%x, ",
> +						mic->name, __func__, __LINE__,
> +						hdr->flags);
> +					mpsslog("out_len %d gso_type 0x%x\n",
> +						copy.out_len,
> +						hdr->gso_type);
> +#endif
> +					/* Set the correct output iov_len */
> +					iov1[1].iov_len = copy.out_len -
> +						sizeof(struct virtio_net_hdr);
> +					verify_out_len(mic, &copy);
> +#ifdef DEBUG
> +					disp_iovec(mic, copy, __func__,
> +						__LINE__);
> +					mpsslog("%s %s %d ",
> +						mic->name, __func__, __LINE__);
> +					mpsslog("read from net 0x%lx\n",
> +						sum_iovec_len(copy));
> +#endif
> +					len = writev(net_poll[NET_FD_TUN].fd,
> +						copy.iov, copy.iovcnt);
> +					if (len != sum_iovec_len(&copy)) {
> +						mpsslog("Tun write failed %s ",
> +							strerror(errno));
> +						mpsslog("len 0x%x ", len);
> +						mpsslog("read_len 0x%x\n",
> +							sum_iovec_len(&copy));
> +					} else {
> +#ifdef DEBUG
> +						disp_iovec(mic, &copy, __func__,
> +							__LINE__);
> +						mpsslog("%s %s %d ",
> +							mic->name, __func__,
> +							__LINE__);
> +						mpsslog("wrote to tap 0x%lx\n",
> +							len);
> +#endif
> +					}
> +				} else {
> +					mpsslog("%s %s %d mic_virtio_copy %s\n",
> +						mic->name, __func__, __LINE__,
> +						strerror(errno));
> +					break;
> +				}
> +			}
> +		}
> +		if (net_poll[NET_FD_VIRTIO_NET].revents & POLLERR) {
> +			mpsslog("%s: %s: POLLERR\n", __func__, mic->name);
> +			sleep(1);
> +		}
> +	}
> +done:
> +	pthread_exit(NULL);
> +}
> +
> +/* virtio_console */
> +#define VIRTIO_CONSOLE_FD 0
> +#define MONITOR_FD (VIRTIO_CONSOLE_FD + 1)
> +#define MAX_CONSOLE_FD (MONITOR_FD + 1)  /* must be the last one + 1 */
> +#define MAX_BUFFER_SIZE PAGE_SIZE
> +
> +static void *
> +virtio_console(void *arg)
> +{
> +	static __u8 vcons_buf[2][PAGE_SIZE];
> +	struct iovec vcons_iov[2] = {
> +		{ .iov_base = vcons_buf[0], .iov_len = sizeof(vcons_buf[0]) },
> +		{ .iov_base = vcons_buf[1], .iov_len = sizeof(vcons_buf[1]) },
> +	};
> +	struct iovec *iov0 = &vcons_iov[0], *iov1 = &vcons_iov[1];
> +	struct mic_info *mic = (struct mic_info *)arg;
> +	int err;
> +	struct pollfd console_poll[MAX_CONSOLE_FD];
> +	int pty_fd;
> +	char *pts_name;
> +	ssize_t len;
> +	struct mic_vring tx_vr, rx_vr;
> +	struct mic_copy_desc copy;
> +	struct mic_device_desc *desc;
> +
> +	pty_fd = posix_openpt(O_RDWR);
> +	if (pty_fd < 0) {
> +		mpsslog("can't open a pseudoterminal master device: %s\n",
> +			strerror(errno));
> +		goto _return;
> +	}
> +	pts_name = ptsname(pty_fd);
> +	if (pts_name == NULL) {
> +		mpsslog("can't get pts name\n");
> +		goto _close_pty;
> +	}
> +	printf("%s console message goes to %s\n", mic->name, pts_name);
> +	mpsslog("%s console message goes to %s\n", mic->name, pts_name);
> +	err = grantpt(pty_fd);
> +	if (err < 0) {
> +		mpsslog("can't grant access: %s %s\n",
> +				pts_name, strerror(errno));
> +		goto _close_pty;
> +	}
> +	err = unlockpt(pty_fd);
> +	if (err < 0) {
> +		mpsslog("can't unlock a pseudoterminal: %s %s\n",
> +				pts_name, strerror(errno));
> +		goto _close_pty;
> +	}
> +	console_poll[MONITOR_FD].fd = pty_fd;
> +	console_poll[MONITOR_FD].events = POLLIN;
> +
> +	console_poll[VIRTIO_CONSOLE_FD].fd = mic->mic_console.virtio_console_fd;
> +	console_poll[VIRTIO_CONSOLE_FD].events = POLLIN;
> +
> +	if (MAP_FAILED == init_vr(mic, mic->mic_console.virtio_console_fd,
> +		VIRTIO_ID_CONSOLE, &tx_vr, &rx_vr,
> +		virtcons_dev_page.dd.num_vq)) {
> +		mpsslog("%s init_vr failed %s\n",
> +			mic->name, strerror(errno));
> +		goto _close_pty;
> +	}
> +
> +	copy.iovcnt = 1;
> +	desc = get_device_desc(mic, VIRTIO_ID_CONSOLE);
> +
> +	for (;;) {
> +		console_poll[MONITOR_FD].revents = 0;
> +		console_poll[VIRTIO_CONSOLE_FD].revents = 0;
> +		err = poll(console_poll, MAX_CONSOLE_FD, -1);
> +		if (err < 0) {
> +			mpsslog("%s %d: poll failed: %s\n", __func__, __LINE__,
> +				strerror(errno));
> +			continue;
> +		}
> +		if (!(desc->status & VIRTIO_CONFIG_S_DRIVER_OK))
> +			wait_for_card_driver(mic,
> +				mic->mic_console.virtio_console_fd,
> +				VIRTIO_ID_CONSOLE);
> +
> +		if (console_poll[MONITOR_FD].revents & POLLIN) {
> +			copy.iov = iov0;
> +			len = readv(pty_fd, copy.iov, copy.iovcnt);
> +			if (len > 0) {
> +#ifdef DEBUG
> +				disp_iovec(mic, copy, __func__, __LINE__);
> +				mpsslog("%s %s %d read from tap 0x%lx\n",
> +					mic->name, __func__, __LINE__,
> +					len);
> +#endif
> +				wait_for_descriptors(mic, &tx_vr);
> +				txrx_prepare(VIRTIO_ID_CONSOLE, 1, &tx_vr,
> +					&copy, len);
> +
> +				err = mic_virtio_copy(mic,
> +					mic->mic_console.virtio_console_fd,
> +					&tx_vr, &copy);
> +				if (err < 0) {
> +					mpsslog("%s %s %d mic_virtio_copy %s\n",
> +						mic->name, __func__, __LINE__,
> +						strerror(errno));
> +				}
> +				if (!err)
> +					verify_out_len(mic, &copy);
> +#ifdef DEBUG
> +				disp_iovec(mic, copy, __func__, __LINE__);
> +				mpsslog("%s %s %d wrote to net 0x%lx\n",
> +					mic->name, __func__, __LINE__,
> +					sum_iovec_len(copy));
> +#endif
> +				/* Reinitialize IOV for next run */
> +				iov0->iov_len = PAGE_SIZE;
> +			} else if (len < 0) {
> +				disp_iovec(mic, &copy, __func__, __LINE__);
> +				mpsslog("%s %s %d read failed %s ",
> +					mic->name, __func__, __LINE__,
> +					strerror(errno));
> +				mpsslog("cnt %d sum %d\n",
> +					copy.iovcnt, sum_iovec_len(&copy));
> +			}
> +		}
> +
> +		if (console_poll[VIRTIO_CONSOLE_FD].revents & POLLIN) {
> +			while (rx_vr.info->avail_idx !=
> +				le16toh(rx_vr.vr.avail->idx)) {
> +				copy.iov = iov1;
> +				txrx_prepare(VIRTIO_ID_CONSOLE, 0, &rx_vr,
> +					&copy, PAGE_SIZE);
> +
> +				err = mic_virtio_copy(mic,
> +					mic->mic_console.virtio_console_fd,
> +					&rx_vr, &copy);
> +				if (!err) {
> +					/* Set the correct output iov_len */
> +					iov1->iov_len = copy.out_len;
> +					verify_out_len(mic, &copy);
> +#ifdef DEBUG
> +					disp_iovec(mic, copy, __func__,
> +						__LINE__);
> +					mpsslog("%s %s %d ",
> +						mic->name, __func__, __LINE__);
> +					mpsslog("read from net 0x%lx\n",
> +						sum_iovec_len(copy));
> +#endif
> +					len = writev(pty_fd,
> +						copy.iov, copy.iovcnt);
> +					if (len != sum_iovec_len(&copy)) {
> +						mpsslog("Tun write failed %s ",
> +							strerror(errno));
> +						mpsslog("len 0x%x ", len);
> +						mpsslog("read_len 0x%x\n",
> +							sum_iovec_len(&copy));
> +					} else {
> +#ifdef DEBUG
> +						disp_iovec(mic, copy, __func__,
> +							__LINE__);
> +						mpsslog("%s %s %d ",
> +							mic->name, __func__,
> +							__LINE__);
> +						mpsslog("wrote to tap 0x%lx\n",
> +							len);
> +#endif
> +					}
> +				} else {
> +					mpsslog("%s %s %d mic_virtio_copy %s\n",
> +						mic->name, __func__, __LINE__,
> +						strerror(errno));
> +					break;
> +				}
> +			}
> +		}
> +		if (console_poll[NET_FD_VIRTIO_NET].revents & POLLERR) {
> +			mpsslog("%s: %s: POLLERR\n", __func__, mic->name);
> +			sleep(1);
> +		}
> +	}
> +_close_pty:
> +	close(pty_fd);
> +_return:
> +	pthread_exit(NULL);
> +}
> +
> +static void
> +add_virtio_device(struct mic_info *mic, struct mic_device_desc *dd)
> +{
> +	char path[PATH_MAX];
> +	int fd, err;
> +
> +	snprintf(path, PATH_MAX, "/dev/mic%d", mic->id);
> +	fd = open(path, O_RDWR);
> +	if (fd < 0) {
> +		mpsslog("Could not open %s %s\n", path, strerror(errno));
> +		return;
> +	}
> +
> +	err = ioctl(fd, MIC_VIRTIO_ADD_DEVICE, dd);
> +	if (err < 0) {
> +		mpsslog("Could not add %d %s\n", dd->type, strerror(errno));
> +		close(fd);
> +		return;
> +	}
> +	switch (dd->type) {
> +	case VIRTIO_ID_NET:
> +		mic->mic_net.virtio_net_fd = fd;
> +		mpsslog("Added VIRTIO_ID_NET for %s\n", mic->name);
> +		break;
> +	case VIRTIO_ID_CONSOLE:
> +		mic->mic_console.virtio_console_fd = fd;
> +		mpsslog("Added VIRTIO_ID_CONSOLE for %s\n", mic->name);
> +		break;
> +	case VIRTIO_ID_BLOCK:
> +		mic->mic_virtblk.virtio_block_fd = fd;
> +		mpsslog("Added VIRTIO_ID_BLOCK for %s\n", mic->name);
> +		break;
> +	}
> +}
> +
> +static bool
> +set_backend_file(struct mic_info *mic)
> +{
> +	FILE *config;
> +	char buff[PATH_MAX], *line, *evv, *p;
> +
> +	snprintf(buff, PATH_MAX, "%s/mpssd%03d.conf", mic_config_dir, mic->id);
> +	config = fopen(buff, "r");
> +	if (config == NULL)
> +		return false;
> +	do {  /* look for "virtblk_backend=XXXX" */
> +		line = fgets(buff, PATH_MAX, config);
> +		if (line == NULL)
> +			break;
> +		if (*line == '#')
> +			continue;
> +		p = strchr(line, '\n');
> +		if (p)
> +			*p = '\0';
> +	} while (strncmp(line, virtblk_backend, strlen(virtblk_backend)) != 0);
> +	fclose(config);
> +	if (line == NULL)
> +		return false;
> +	evv = strchr(line, '=');
> +	if (evv == NULL)
> +		return false;
> +	mic->mic_virtblk.backend_file = malloc(strlen(evv));
> +	if (mic->mic_virtblk.backend_file == NULL) {
> +		mpsslog("can't allocate memory\n", mic->name, mic->id);
> +		return false;
> +	}
> +	strcpy(mic->mic_virtblk.backend_file, evv + 1);
> +	return true;
> +}
> +
> +#define SECTOR_SIZE 512
> +static bool
> +set_backend_size(struct mic_info *mic)
> +{
> +	mic->mic_virtblk.backend_size = lseek(mic->mic_virtblk.backend, 0,
> +		SEEK_END);
> +	if (mic->mic_virtblk.backend_size < 0) {
> +		mpsslog("%s: can't seek: %s\n",
> +			mic->name, mic->mic_virtblk.backend_file);
> +		return false;
> +	}
> +	virtblk_dev_page.blk_config.capacity =
> +		mic->mic_virtblk.backend_size / SECTOR_SIZE;
> +	if ((mic->mic_virtblk.backend_size % SECTOR_SIZE) != 0)
> +		virtblk_dev_page.blk_config.capacity++;
> +
> +	virtblk_dev_page.blk_config.capacity =
> +		htole64(virtblk_dev_page.blk_config.capacity);
> +
> +	return true;
> +}
> +
> +static bool
> +open_backend(struct mic_info *mic)
> +{
> +	if (!set_backend_file(mic))
> +		goto _error_exit;
> +	mic->mic_virtblk.backend = open(mic->mic_virtblk.backend_file, O_RDWR);
> +	if (mic->mic_virtblk.backend < 0) {
> +		mpsslog("%s: can't open: %s\n", mic->name,
> +			mic->mic_virtblk.backend_file);
> +		goto _error_free;
> +	}
> +	if (!set_backend_size(mic))
> +		goto _error_close;
> +	mic->mic_virtblk.backend_addr = mmap(NULL,
> +		mic->mic_virtblk.backend_size,
> +		PROT_READ|PROT_WRITE, MAP_SHARED,
> +		mic->mic_virtblk.backend, 0L);
> +	if (mic->mic_virtblk.backend_addr == MAP_FAILED) {
> +		mpsslog("%s: can't map: %s %s\n",
> +			mic->name, mic->mic_virtblk.backend_file,
> +			strerror(errno));
> +		goto _error_close;
> +	}
> +	return true;
> +
> + _error_close:
> +	close(mic->mic_virtblk.backend);
> + _error_free:
> +	free(mic->mic_virtblk.backend_file);
> + _error_exit:
> +	return false;
> +}
> +
> +static void
> +close_backend(struct mic_info *mic)
> +{
> +	munmap(mic->mic_virtblk.backend_addr, mic->mic_virtblk.backend_size);
> +	close(mic->mic_virtblk.backend);
> +	free(mic->mic_virtblk.backend_file);
> +}
> +
> +static bool
> +start_virtblk(struct mic_info *mic, struct mic_vring *vring)
> +{
> +	if (((__u64)&virtblk_dev_page.blk_config % 8) != 0) {
> +		mpsslog("%s: blk_config is not 8 byte aligned.\n",
> +			mic->name);
> +		return false;
> +	}
> +	add_virtio_device(mic, &virtblk_dev_page.dd);
> +	if (MAP_FAILED == init_vr(mic, mic->mic_virtblk.virtio_block_fd,
> +		VIRTIO_ID_BLOCK, vring, NULL, virtblk_dev_page.dd.num_vq)) {
> +		mpsslog("%s init_vr failed %s\n",
> +			mic->name, strerror(errno));
> +		return false;
> +	}
> +	return true;
> +}
> +
> +static void
> +stop_virtblk(struct mic_info *mic)
> +{
> +	uninit_vr(mic, virtblk_dev_page.dd.num_vq);
> +	close(mic->mic_virtblk.virtio_block_fd);
> +}
> +
> +static __u8
> +header_error_check(struct vring_desc *desc)
> +{
> +	if (le32toh(desc->len) != sizeof(struct virtio_blk_outhdr)) {
> +		mpsslog("%s() %d: length is not sizeof(virtio_blk_outhd)\n",
> +				__func__, __LINE__);
> +		return -EIO;
> +	}
> +	if (!(le16toh(desc->flags) & VRING_DESC_F_NEXT)) {
> +		mpsslog("%s() %d: alone\n",
> +			__func__, __LINE__);
> +		return -EIO;
> +	}
> +	if (le16toh(desc->flags) & VRING_DESC_F_WRITE) {
> +		mpsslog("%s() %d: not read\n",
> +			__func__, __LINE__);
> +		return -EIO;
> +	}
> +	return 0;
> +}
> +
> +static int
> +read_header(int fd, struct virtio_blk_outhdr *hdr, __u32 desc_idx)
> +{
> +	struct iovec iovec;
> +	struct mic_copy_desc copy;
> +
> +	iovec.iov_len = sizeof(*hdr);
> +	iovec.iov_base = hdr;
> +	copy.iov = &iovec;
> +	copy.iovcnt = 1;
> +	copy.vr_idx = 0;  /* only one vring on virtio_block */
> +	copy.update_used = false;  /* do not update used index */
> +	return ioctl(fd, MIC_VIRTIO_COPY_DESC, &copy);
> +}
> +
> +static int
> +transfer_blocks(int fd, struct iovec *iovec, __u32 iovcnt)
> +{
> +	struct mic_copy_desc copy;
> +
> +	copy.iov = iovec;
> +	copy.iovcnt = iovcnt;
> +	copy.vr_idx = 0;  /* only one vring on virtio_block */
> +	copy.update_used = false;  /* do not update used index */
> +	return ioctl(fd, MIC_VIRTIO_COPY_DESC, &copy);
> +}
> +
> +static __u8
> +status_error_check(struct vring_desc *desc)
> +{
> +	if (le32toh(desc->len) != sizeof(__u8)) {
> +		mpsslog("%s() %d: length is not sizeof(status)\n",
> +			__func__, __LINE__);
> +		return -EIO;
> +	}
> +	return 0;
> +}
> +
> +static int
> +write_status(int fd, __u8 *status)
> +{
> +	struct iovec iovec;
> +	struct mic_copy_desc copy;
> +
> +	iovec.iov_base = status;
> +	iovec.iov_len = sizeof(*status);
> +	copy.iov = &iovec;
> +	copy.iovcnt = 1;
> +	copy.vr_idx = 0;  /* only one vring on virtio_block */
> +	copy.update_used = true; /* Update used index */
> +	return ioctl(fd, MIC_VIRTIO_COPY_DESC, &copy);
> +}
> +
> +static void *
> +virtio_block(void *arg)
> +{
> +	struct mic_info *mic = (struct mic_info *) arg;
> +	int ret;
> +	struct pollfd block_poll;
> +	struct mic_vring vring;
> +	__u16 avail_idx;
> +	__u32 desc_idx;
> +	struct vring_desc *desc;
> +	struct iovec *iovec, *piov;
> +	__u8 status;
> +	__u32 buffer_desc_idx;
> +	struct virtio_blk_outhdr hdr;
> +	void *fos;
> +
> +	for (;;) {  /* forever */
> +		if (!open_backend(mic)) { /* No virtblk */
> +			for (mic->mic_virtblk.signaled = 0;
> +				!mic->mic_virtblk.signaled;)
> +				sleep(1);
> +			continue;
> +		}
> +
> +		/* backend file is specified. */
> +		if (!start_virtblk(mic, &vring))
> +			goto _close_backend;
> +		iovec = malloc(sizeof(*iovec) *
> +			le32toh(virtblk_dev_page.blk_config.seg_max));
> +		if (!iovec) {
> +			mpsslog("%s: can't alloc iovec: %s\n",
> +				mic->name, strerror(ENOMEM));
> +			goto _stop_virtblk;
> +		}
> +
> +		block_poll.fd = mic->mic_virtblk.virtio_block_fd;
> +		block_poll.events = POLLIN;
> +		for (mic->mic_virtblk.signaled = 0;
> +		     !mic->mic_virtblk.signaled;) {
> +			block_poll.revents = 0;
> +					/* timeout in 1 sec to see signaled */
> +			ret = poll(&block_poll, 1, 1000);
> +			if (ret < 0) {
> +				mpsslog("%s %d: poll failed: %s\n",
> +					__func__, __LINE__,
> +					strerror(errno));
> +				continue;
> +			}
> +
> +			if (!(block_poll.revents & POLLIN)) {
> +#ifdef DEBUG
> +				mpsslog("%s %d: block_poll.revents=0x%x\n",
> +					__func__, __LINE__, block_poll.revents);
> +				sleep(1);
> +#endif
> +				continue;
> +			}
> +
> +			/* POLLIN */
> +			while (vring.info->avail_idx !=
> +				le16toh(vring.vr.avail->idx)) {
> +				/* read header element */
> +				avail_idx =
> +					vring.info->avail_idx &
> +					(vring.vr.num - 1);
> +				desc_idx = le16toh(
> +					vring.vr.avail->ring[avail_idx]);
> +				desc = &vring.vr.desc[desc_idx];
> +#ifdef DEBUG
> +				mpsslog("%s() %d: avail_idx=%d ",
> +					__func__, __LINE__,
> +					vring.info->avail_idx);
> +				mpsslog("vring.vr.num=%d desc=%p\n",
> +					vring.vr.num, desc);
> +#endif
> +				status = header_error_check(desc);
> +				ret = read_header(
> +					mic->mic_virtblk.virtio_block_fd,
> +					&hdr, desc_idx);
> +				if (ret < 0) {
> +					mpsslog("%s() %d %s: ret=%d %s\n",
> +						__func__, __LINE__,
> +						mic->name, ret,
> +						strerror(errno));
> +					break;
> +				}
> +				/* buffer element */
> +				piov = iovec;
> +				status = 0;
> +				fos = mic->mic_virtblk.backend_addr +
> +					(hdr.sector * SECTOR_SIZE);
> +				buffer_desc_idx = desc_idx =
> +					next_desc(desc);
> +				for (desc = &vring.vr.desc[buffer_desc_idx];
> +				     desc->flags & VRING_DESC_F_NEXT;
> +				     desc_idx = next_desc(desc),
> +					     desc = &vring.vr.desc[desc_idx]) {
> +					piov->iov_len = desc->len;
> +					piov->iov_base = fos;
> +					piov++;
> +					fos += desc->len;
> +				}
> +				/* Returning NULLs for VIRTIO_BLK_T_GET_ID. */
> +				if (hdr.type & ~(VIRTIO_BLK_T_OUT |
> +					VIRTIO_BLK_T_GET_ID)) {
> +					/*
> +					  VIRTIO_BLK_T_IN - does not do
> +					  anything. Probably for documenting.
> +					  VIRTIO_BLK_T_SCSI_CMD - for
> +					  virtio_scsi.
> +					  VIRTIO_BLK_T_FLUSH - turned off in
> +					  config space.
> +					  VIRTIO_BLK_T_BARRIER - defined but not
> +					  used in anywhere.
> +					*/
> +					mpsslog("%s() %d: type %x ",
> +						__func__, __LINE__,
> +						hdr.type);
> +					mpsslog("is not supported\n");
> +					status = -ENOTSUP;
> +
> +				} else {
> +					ret = transfer_blocks(
> +					mic->mic_virtblk.virtio_block_fd,
> +						iovec,
> +						piov - iovec);
> +					if (ret < 0 &&
> +						status != 0)
> +						status = ret;
> +				}
> +				/* write status and update used pointer */
> +				if (status != 0)
> +					status = status_error_check(desc);
> +				ret = write_status(
> +					mic->mic_virtblk.virtio_block_fd,
> +					&status);
> +#ifdef DEBUG
> +				mpsslog("%s() %d: write status=%d on desc=%p\n",
> +					__func__, __LINE__,
> +					status, desc);
> +#endif
> +			}
> +		}
> +		free(iovec);
> +_stop_virtblk:
> +		stop_virtblk(mic);
> +_close_backend:
> +		close_backend(mic);
> +	}  /* forever */
> +
> +	pthread_exit(NULL);
> +}
> +
> +static void
> +reset(struct mic_info *mic)
> +{
> +#define RESET_TIMEOUT 120
> +	int i = RESET_TIMEOUT;
> +	setsysfs(mic->name, "state", "reset");
> +	while (i) {
> +		char *state;
> +		state = readsysfs(mic->name, "state");
> +		if (!state)
> +			goto retry;
> +		mpsslog("%s: %s %d state %s\n",
> +			mic->name, __func__, __LINE__, state);
> +		if ((!strcmp(state, "offline"))) {
> +			free(state);
> +			break;
> +		}
> +		free(state);
> +retry:
> +		sleep(1);
> +		i--;
> +	}
> +}
> +
> +static int
> +get_mic_shutdown_status(struct mic_info *mic, char *shutdown_status)
> +{
> +	if (!strcmp(shutdown_status, "nop"))
> +		return MIC_NOP;
> +	if (!strcmp(shutdown_status, "crashed"))
> +		return MIC_CRASHED;
> +	if (!strcmp(shutdown_status, "halted"))
> +		return MIC_HALTED;
> +	if (!strcmp(shutdown_status, "poweroff"))
> +		return MIC_POWER_OFF;
> +	if (!strcmp(shutdown_status, "restart"))
> +		return MIC_RESTART;
> +	mpsslog("%s: BUG invalid status %s\n", mic->name, shutdown_status);
> +	/* Invalid state */
> +	assert(0);
> +};
> +
> +static int get_mic_state(struct mic_info *mic, char *state)
> +{
> +	if (!strcmp(state, "offline"))
> +		return MIC_OFFLINE;
> +	if (!strcmp(state, "online"))
> +		return MIC_ONLINE;
> +	if (!strcmp(state, "shutting_down"))
> +		return MIC_SHUTTING_DOWN;
> +	if (!strcmp(state, "reset_failed"))
> +		return MIC_RESET_FAILED;
> +	mpsslog("%s: BUG invalid state %s\n", mic->name, state);
> +	/* Invalid state */
> +	assert(0);
> +};
> +
> +static void mic_handle_shutdown(struct mic_info *mic)
> +{
> +#define SHUTDOWN_TIMEOUT 60
> +	int i = SHUTDOWN_TIMEOUT, ret, stat = 0;
> +	char *shutdown_status;
> +	while (i) {
> +		shutdown_status = readsysfs(mic->name, "shutdown_status");
> +		if (!shutdown_status)
> +			continue;
> +		mpsslog("%s: %s %d shutdown_status %s\n",
> +			mic->name, __func__, __LINE__, shutdown_status);
> +		switch (get_mic_shutdown_status(mic, shutdown_status)) {
> +		case MIC_RESTART:
> +			mic->restart = 1;
> +		case MIC_HALTED:
> +		case MIC_POWER_OFF:
> +		case MIC_CRASHED:
> +			goto reset;
> +		default:
> +			break;
> +		}
> +		free(shutdown_status);
> +		sleep(1);
> +		i--;
> +	}
> +reset:
> +	ret = kill(mic->pid, SIGTERM);
> +	mpsslog("%s: %s %d kill pid %d ret %d\n",
> +		mic->name, __func__, __LINE__,
> +		mic->pid, ret);
> +	if (!ret) {
> +		ret = waitpid(mic->pid, &stat,
> +			WIFSIGNALED(stat));
> +		mpsslog("%s: %s %d waitpid ret %d pid %d\n",
> +			mic->name, __func__, __LINE__,
> +			ret, mic->pid);
> +	}
> +	if (ret == mic->pid)
> +		reset(mic);
> +}
> +
> +static void *
> +mic_config(void *arg)
> +{
> +	struct mic_info *mic = (struct mic_info *)arg;
> +	char *state = NULL;
> +	char pathname[PATH_MAX];
> +	int fd, ret;
> +	struct pollfd ufds[1];
> +	char value[4096];
> +
> +	snprintf(pathname, PATH_MAX - 1, "%s/%s/%s",
> +		MICSYSFSDIR, mic->name, "state");
> +
> +	fd = open(pathname, O_RDONLY);
> +	if (fd < 0) {
> +		mpsslog("%s: opening file %s failed %s\n",
> +			mic->name, pathname, strerror(errno));
> +		goto error;
> +	}
> +
> +	do {
> +		ret = read(fd, value, sizeof(value));
> +		if (ret < 0) {
> +			mpsslog("%s: Failed to read sysfs entry '%s': %s\n",
> +				mic->name, pathname, strerror(errno));
> +			goto close_error1;
> +		}
> +retry:
> +		state = readsysfs(mic->name, "state");
> +		if (!state)
> +			goto retry;
> +		mpsslog("%s: %s %d state %s\n",
> +			mic->name, __func__, __LINE__, state);
> +		switch (get_mic_state(mic, state)) {
> +		case MIC_SHUTTING_DOWN:
> +			mic_handle_shutdown(mic);
> +			goto close_error;
> +		default:
> +			break;
> +		}
> +		free(state);
> +
> +		ufds[0].fd = fd;
> +		ufds[0].events = POLLERR | POLLPRI;
> +		ret = poll(ufds, 1, -1);
> +		if (ret < 0) {
> +			mpsslog("%s: poll failed %s\n",
> +				mic->name, strerror(errno));
> +			goto close_error1;
> +		}
> +	} while (1);
> +close_error:
> +	free(state);
> +close_error1:
> +	close(fd);
> +error:
> +	init_mic(mic);
> +	pthread_exit(NULL);
> +}
> +
> +static void
> +set_cmdline(struct mic_info *mic)
> +{
> +	char buffer[PATH_MAX];
> +	int len;
> +
> +	len = snprintf(buffer, PATH_MAX,
> +		"clocksource=tsc highres=off nohz=off ");
> +	len += snprintf(buffer + len, PATH_MAX,
> +		"cpufreq_on;corec6_off;pc3_off;pc6_off ");
> +	len += snprintf(buffer + len, PATH_MAX,
> +		"ifcfg=static;address,172.31.%d.1;netmask,255.255.255.0",
> +		mic->id);
> +
> +	setsysfs(mic->name, "cmdline", buffer);
> +	mpsslog("%s: Command line: \"%s\"\n", mic->name, buffer);
> +	snprintf(buffer, PATH_MAX, "172.31.%d.1", mic->id);
> +	mpsslog("%s: IPADDR: \"%s\"\n", mic->name, buffer);
> +}
> +
> +static void
> +set_log_buf_info(struct mic_info *mic)
> +{
> +	int fd;
> +	off_t len;
> +	char system_map[] = "/lib/firmware/mic/System.map";
> +	char *map, *temp, log_buf[17] = {'\0'};
> +
> +	fd = open(system_map, O_RDONLY);
> +	if (fd < 0) {
> +		mpsslog("%s: Opening System.map failed: %d\n",
> +			mic->name, errno);
> +		return;
> +	}
> +	len = lseek(fd, 0, SEEK_END);
> +	if (len < 0) {
> +		mpsslog("%s: Reading System.map size failed: %d\n",
> +			mic->name, errno);
> +		close(fd);
> +		return;
> +	}
> +	map = mmap(NULL, len, PROT_READ, MAP_PRIVATE, fd, 0);
> +	if (map == MAP_FAILED) {
> +		mpsslog("%s: mmap of System.map failed: %d\n",
> +			mic->name, errno);
> +		close(fd);
> +		return;
> +	}
> +	temp = strstr(map, "__log_buf");
> +	if (!temp) {
> +		mpsslog("%s: __log_buf not found: %d\n", mic->name, errno);
> +		munmap(map, len);
> +		close(fd);
> +		return;
> +	}
> +	strncpy(log_buf, temp - 19, 16);
> +	setsysfs(mic->name, "log_buf_addr", log_buf);
> +	mpsslog("%s: log_buf_addr: %s\n", mic->name, log_buf);
> +	temp = strstr(map, "log_buf_len");
> +	if (!temp) {
> +		mpsslog("%s: log_buf_len not found: %d\n", mic->name, errno);
> +		munmap(map, len);
> +		close(fd);
> +		return;
> +	}
> +	strncpy(log_buf, temp - 19, 16);
> +	setsysfs(mic->name, "log_buf_len", log_buf);
> +	mpsslog("%s: log_buf_len: %s\n", mic->name, log_buf);
> +	munmap(map, len);
> +	close(fd);
> +}
> +
> +static void init_mic(struct mic_info *mic);
> +
> +static void
> +change_virtblk_backend(int x, siginfo_t *siginfo, void *p)
> +{
> +	struct mic_info *mic;
> +
> +	for (mic = mic_list.next; mic != NULL; mic = mic->next)
> +		mic->mic_virtblk.signaled = 1/* true */;
> +}
> +
> +static void
> +init_mic(struct mic_info *mic)
> +{
> +	struct sigaction ignore = {
> +		.sa_flags = 0,
> +		.sa_handler = SIG_IGN
> +	};
> +	struct sigaction act = {
> +		.sa_flags = SA_SIGINFO,
> +		.sa_sigaction = change_virtblk_backend,
> +	};
> +	char buffer[PATH_MAX];
> +	int err;
> +
> +		/* ignore SIGUSR1 for both process */
> +	sigaction(SIGUSR1, &ignore, NULL);
> +
> +	mic->pid = fork();
> +	switch (mic->pid) {
> +	case 0:
> +		set_log_buf_info(mic);
> +		set_cmdline(mic);
> +		add_virtio_device(mic, &virtcons_dev_page.dd);
> +		add_virtio_device(mic, &virtnet_dev_page.dd);
> +		err = pthread_create(&mic->mic_console.console_thread, NULL,
> +			virtio_console, mic);
> +		if (err)
> +			mpsslog("%s virtcons pthread_create failed %s\n",
> +			mic->name, strerror(err));
> +		/*
> +		 * TODO: Debug why not adding this sleep results in the tap
> +		 * interface not coming up during certain runs sporadically.
> +		 */

Indeed.

> +		usleep(1000);
> +		err = pthread_create(&mic->mic_net.net_thread, NULL,
> +			virtio_net, mic);
> +		if (err)
> +			mpsslog("%s virtnet pthread_create failed %s\n",
> +			mic->name, strerror(err));
> +		err = pthread_create(&mic->mic_virtblk.block_thread, NULL,
> +			virtio_block, mic);
> +		if (err)
> +			mpsslog("%s virtblk pthread_create failed %s\n",
> +			mic->name, strerror(err));
> +		sigemptyset(&act.sa_mask);
> +		err = sigaction(SIGUSR1, &act, NULL);

Confused. Who sends this SIGUSR1 here?


> +		if (err)
> +			mpsslog("%s sigaction SIGUSR1 failed %s\n",
> +			mic->name, strerror(errno));
> +		while (1)
> +			sleep(60);
> +	case -1:
> +		mpsslog("fork failed MIC name %s id %d errno %d\n",
> +			mic->name, mic->id, errno);
> +		break;
> +	default:
> +		if (mic->restart) {
> +			snprintf(buffer, PATH_MAX,
> +				"boot:linux:mic/uos.img:mic/mic%d.image",
> +				mic->id);
> +			setsysfs(mic->name, "state", buffer);
> +			mpsslog("%s restarting mic %d\n",
> +				mic->name, mic->restart);
> +			mic->restart = 0;
> +		}
> +		pthread_create(&mic->config_thread, NULL, mic_config, mic);
> +	}
> +}
> +
> +static void
> +start_daemon(void)
> +{
> +	struct mic_info *mic;
> +
> +	for (mic = mic_list.next; mic != NULL; mic = mic->next)
> +		init_mic(mic);
> +
> +	while (1)
> +		sleep(60);
> +}
> +
> +static int
> +init_mic_list(void)
> +{
> +	struct mic_info *mic = &mic_list;
> +	struct dirent *file;
> +	DIR *dp;
> +	int cnt = 0;
> +
> +	dp = opendir(MICSYSFSDIR);
> +	if (!dp)
> +		return 0;
> +
> +	while ((file = readdir(dp)) != NULL) {
> +		if (!strncmp(file->d_name, "mic", 3)) {
> +			mic->next = malloc(sizeof(struct mic_info));
> +			if (mic->next) {
> +				mic = mic->next;
> +				mic->next = NULL;
> +				memset(mic, 0, sizeof(struct mic_info));
> +				mic->id = atoi(&file->d_name[3]);
> +				mic->name = malloc(strlen(file->d_name) + 16);
> +				if (mic->name)
> +					strcpy(mic->name, file->d_name);
> +				mpsslog("MIC name %s id %d\n", mic->name,
> +					mic->id);
> +				cnt++;
> +			}
> +		}
> +	}
> +
> +	closedir(dp);
> +	return cnt;
> +}
> +
> +void
> +mpsslog(char *format, ...)
> +{
> +	va_list args;
> +	char buffer[4096];
> +	time_t t;
> +	char *ts;
> +
> +	if (logfp == NULL)
> +		return;
> +
> +	va_start(args, format);
> +	vsprintf(buffer, format, args);
> +	va_end(args);
> +
> +	time(&t);
> +	ts = ctime(&t);
> +	ts[strlen(ts) - 1] = '\0';
> +	fprintf(logfp, "%s: %s", ts, buffer);
> +
> +	fflush(logfp);
> +}
> +
> +int
> +main(int argc, char *argv[])
> +{
> +	int cnt;
> +
> +	myname = argv[0];
> +
> +	logfp = fopen(LOGFILE_NAME, "a+");
> +	if (!logfp) {
> +		fprintf(stderr, "cannot open logfile '%s'\n", LOGFILE_NAME);
> +		exit(1);
> +	}
> +
> +	mpsslog("MIC Daemon start\n");
> +
> +	cnt = init_mic_list();
> +	if (cnt == 0) {
> +		mpsslog("MIC module not loaded\n");
> +		exit(2);
> +	}
> +	mpsslog("MIC found %d devices\n", cnt);
> +
> +	start_daemon();
> +
> +	exit(0);
> +}
> diff --git a/Documentation/mic/mpssd/mpssd.h b/Documentation/mic/mpssd/mpssd.h
> new file mode 100644
> index 0000000..b6dee38
> --- /dev/null
> +++ b/Documentation/mic/mpssd/mpssd.h
> @@ -0,0 +1,100 @@
> +/*
> + * Intel MIC Platform Software Stack (MPSS)
> + *
> + * Copyright(c) 2013 Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * General Public License for more details.
> + *
> + * The full GNU General Public License is included in this distribution in
> + * the file called "COPYING".
> + *
> + * Intel MIC User Space Tools.
> + */
> +#ifndef _MPSSD_H_
> +#define _MPSSD_H_
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <fcntl.h>
> +#include <unistd.h>
> +#include <dirent.h>
> +#include <libgen.h>
> +#include <pthread.h>
> +#include <stdarg.h>
> +#include <time.h>
> +#include <errno.h>
> +#include <sys/dir.h>
> +#include <sys/ioctl.h>
> +#include <sys/poll.h>
> +#include <sys/types.h>
> +#include <sys/socket.h>
> +#include <sys/stat.h>
> +#include <sys/types.h>
> +#include <sys/mman.h>
> +#include <sys/utsname.h>
> +#include <sys/wait.h>
> +#include <netinet/in.h>
> +#include <arpa/inet.h>
> +#include <netdb.h>
> +#include <pthread.h>
> +#include <signal.h>
> +#include <limits.h>
> +#include <syslog.h>
> +#include <getopt.h>
> +#include <net/if.h>
> +#include <linux/if_tun.h>
> +#include <linux/if_tun.h>
> +#include <linux/virtio_ids.h>
> +
> +#define MICSYSFSDIR "/sys/class/mic"
> +#define LOGFILE_NAME "/var/log/mpssd"
> +#define PAGE_SIZE 4096
> +
> +struct mic_console_info {
> +	pthread_t       console_thread;
> +	int		virtio_console_fd;
> +	void		*console_dp;
> +};
> +
> +struct mic_net_info {
> +	pthread_t       net_thread;
> +	int		virtio_net_fd;
> +	int		tap_fd;
> +	void		*net_dp;
> +};
> +
> +struct mic_virtblk_info {
> +	pthread_t       block_thread;
> +	int		virtio_block_fd;
> +	void		*block_dp;
> +	volatile sig_atomic_t	signaled;
> +	char		*backend_file;
> +	int		backend;
> +	void		*backend_addr;
> +	long		backend_size;
> +};
> +
> +struct mic_info {
> +	int		id;
> +	char		*name;
> +	pthread_t       config_thread;
> +	pid_t		pid;
> +	struct mic_console_info	mic_console;
> +	struct mic_net_info	mic_net;
> +	struct mic_virtblk_info	mic_virtblk;
> +	int		restart;
> +	struct mic_info *next;
> +};
> +
> +void mpsslog(char *format, ...);
> +char *readsysfs(char *dir, char *entry);
> +int setsysfs(char *dir, char *entry, char *value);
> +#endif
> diff --git a/Documentation/mic/mpssd/sysfs.c b/Documentation/mic/mpssd/sysfs.c
> new file mode 100644
> index 0000000..3244dcf
> --- /dev/null
> +++ b/Documentation/mic/mpssd/sysfs.c
> @@ -0,0 +1,103 @@
> +/*
> + * Intel MIC Platform Software Stack (MPSS)
> + *
> + * Copyright(c) 2013 Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * General Public License for more details.
> + *
> + * The full GNU General Public License is included in this distribution in
> + * the file called "COPYING".
> + *
> + * Intel MIC User Space Tools.
> + */
> +
> +#include "mpssd.h"
> +
> +#define PAGE_SIZE 4096
> +
> +char *
> +readsysfs(char *dir, char *entry)
> +{
> +	char filename[PATH_MAX];
> +	char value[PAGE_SIZE];
> +	char *string = NULL;
> +	int fd;
> +	int len;
> +
> +	if (dir == NULL)
> +		snprintf(filename, PATH_MAX, "%s/%s", MICSYSFSDIR, entry);
> +	else
> +		snprintf(filename, PATH_MAX,
> +			"%s/%s/%s", MICSYSFSDIR, dir, entry);
> +
> +	fd = open(filename, O_RDONLY);
> +	if (fd < 0) {
> +		mpsslog("Failed to open sysfs entry '%s': %s\n",
> +			filename, strerror(errno));
> +		return NULL;
> +	}
> +
> +	len = read(fd, value, sizeof(value));
> +	if (len < 0) {
> +		mpsslog("Failed to read sysfs entry '%s': %s\n",
> +			filename, strerror(errno));
> +		goto readsys_ret;
> +	}
> +
> +	value[len] = '\0';

Why are you careful to put this \0 here but not in setsysfs below?

If you do, I'd fail on len == sizeof value as well, it isn't going to work with
that.

> +
> +	string = malloc(strlen(value) + 1);
> +	if (string)
> +		strcpy(string, value);
> +
> +readsys_ret:
> +	close(fd);
> +	return string;
> +}
> +
> +int
> +setsysfs(char *dir, char *entry, char *value)
> +{
> +	char filename[PATH_MAX];
> +	char oldvalue[PAGE_SIZE];
> +	int fd;
> +
> +	if (dir == NULL)
> +		snprintf(filename, PATH_MAX, "%s/%s", MICSYSFSDIR, entry);
> +	else
> +		snprintf(filename, PATH_MAX, "%s/%s/%s",
> +			MICSYSFSDIR, dir, entry);
> +
> +	fd = open(filename, O_RDWR);
> +	if (fd < 0) {
> +		mpsslog("Failed to open sysfs entry '%s': %s\n",
> +			filename, strerror(errno));
> +		return errno;
> +	}
> +
> +	if (read(fd, oldvalue, sizeof(oldvalue)) < 0) {
> +		mpsslog("Failed to read sysfs entry '%s': %s\n",
> +			filename, strerror(errno));
> +		close(fd);
> +		return errno;
> +	}
> +
> +	if (strcmp(value, oldvalue)) {
> +		if (write(fd, value, strlen(value)) < 0) {
> +			mpsslog("Failed to write new sysfs entry '%s': %s\n",
> +				filename, strerror(errno));
> +			close(fd);
> +			return errno;
> +		}
> +	}
> +
> +	close(fd);
> +	return 0;
> +}
> -- 
> 1.8.2.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ