lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120605104954.15442.62695.stgit@ltc189.sdl.hitachi.co.jp>
Date:	Tue, 05 Jun 2012 19:49:54 +0900
From:	Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@...achi.com>
To:	linux-kernel@...r.kernel.org, Cam Macdonell <cam@...ualberta.ca>
Cc:	Arnaldo Carvalho de Melo <acme@...hat.com>,
	Borislav Petkov <borislav.petkov@....com>,
	Grant Likely <grant.likely@...retlab.ca>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Joerg Roedel <joerg.roedel@....com>,
	Linus Walleij <linus.walleij@...aro.org>,
	MyungJoo Ham <myungjoo.ham@...sung.com>,
	Ohad Ben-Cohen <ohad@...ery.com>,
	Rusty Russell <rusty@...tcorp.com.au>, qemu-devel@...gnu.org,
	systemtap@...rceware.org, yrl.pp-manager.tt@...achi.com
Subject: [RFC PATCH 0/2] ivring: Add IVRing driver

Hi All,

The following patch set provides a new communication path "IVRing" for
collecting kernel log or tracing data of guests by a host without using network
in a virtualization environment. Network is generally used to collect log or
tracing data after outputting the data as a file. However, since I/O resources
such as network or block are shared with other guests, these resources should
not be used for logging or tracing. Moreover, high load will be taken to
applications on guests using network I/O because there are many network stack
layers. Then, a communication method for collecting the data without using
I/O resources is needed.

There are two requirements to collect kernel log or tracing data by a host:
 (1) To minimize for user applications in a guest
     - not using I/O resources
 (2) To be implemented recording buffer like ring
     - keep on recording log data or trace data
To meet these requirements, a ring-buffer as a device driver for guest OSs,
called IVRing, is constructed on Inter-VM shared memory (IVShmem) device.
IVShmem implemented in QEMU is a virtual PCI RAM device and uses POSIX shared
memory on a host. This device is originally used as a virtual device for
low-overhead communication between two guests. On the other hand, here, IVShmem
is used as a communication path between a guest and a host for collecting data.
IVRing is a buffer of logging or tracing data in a guest, and IVRing-reader,
opening shared memory as IVRing on a host, reads the data without memory copying
between a guest and a host. Thus, two requirements are met for collecting kernel
log or tracing data.

We will talk about IVRing in LinuxCon Japan 2012:
	https://events.linuxfoundation.org/events/linuxcon-japan
	Title: Low-Overhead Ring-Buffer of Kernel Tracing &
	       Tracing Across Host OS and Guest OS
	Speakers: Yoshihiro Yunomae and Akihiro Nagai
You can download our slides about IVRing in the schedule page.

***Evaluation***
When a host collects tracing data of a guest, the performance of using IVRing
is compared with that of using network.

<environment>
The overview of this evaluation is as follows:
 (a) A guest on a KVM is prepared.
     - The guest is dedicated one physical CPU as a virtual CPU(VCPU).

 (b) The guest starts to write tracing data to a SystemTap buffer.
     - The probe points of SystemTap are all trace points of sched, timer,
       and kmem.

 (c) The tracing data are recorded to IVRing sharing memory with a host or
     the tracing data are sent to a host via network.
     - 3 patterns, IVRing, NFS, and SSH, are measured.
       Each methods is explained about later.

 (d) Writing trace data, dhrystone 2 in UNIX bench is executed as a benchmark
     tool in the guest.
     - Dhrystone 2 intends system performance by repeating integer arithmetic
       as a score.
     - Since higher score equals to better system performance, if the score
       decrease based on bare environment, it indicates that any operation
       disturbs the integer arithmetic. Then, we define the overhead of
       transporting trace data is calculated as follows:
		OVERHEAD = (1 - SCORE_OF_A_METHOD/BARE_SCORE) * 100.

The performance of each method is compared as follows:
 [1] IVRing
     - A SystemTap script in a guest records trace data to IVRing.
     - A IVRing-reader on a host reads the data.
 [2] NFS
     - A directory in a guest is shared with that in a host via NFS.
     - A SystemTap script in a guest records trace data to a file
       in the directory.
 [3] SSH
     - A SystemTap script in a guest output trace data to a host using
       standard output via SSH.

Other information is as follows:
 - host
   kernel: 3.3.1-5 (Fedora16)
   CPU: Intel Xeon x5660@...0GHz(6core)
   Memory: 50GB

 - guest(only booting one guest)
   kernel: 3.4.0+ (Fedora16)
   CPU: 1VCPU(dedicated)
   Memory: 2GB

<result>
3 patterns based on the bare environment were indicated as follows:
	                Scores      overhead against [0] Bare
	 [0] Bare      29043600                -
	 [1] IVRing    28565398              1.6[%]
	 [2] NFS       22000508             24.3[%]
	 [3] SSH       10246792             64.7[%]
The overhead of IVRing is much lower than other methods using network. This is
because the IVRing method only records trace data to a ring-buffer. On the
other hand, other methods read trace data from a SystemTap buffer to the
userland and send the data to a host via network. Therefore, a method of using
IVRing minimizes the overhead of transporting trace data from a guest to a host.

***How to use***
Here, how to use IVRing and IVRing-reader is simply given.

1. Prepare any distribution including qemu-kvm binary after 0.13.0 version.
 IVShmem was pushed on qemu-kvm mainline after 0.13.0 version.
 Latest Fedora or Ubuntsu are available.

2. Boot a guest installed IVRing driver with device option.
 A device option is needed as follows:
	-device ivshmem,size=<shm_size in MB>,shm=<shm_obj>
shm_obj, shared memory object path, is used later to share the memory region
with the reader on a host. For example, a device option is like below:
	-device ivshmem,size=2,shm=/ivshmem
 IVShmem supports interrupts mode using ivshmem_server and this IVRing driver is
implemented as usable for doorbelling to the reader as a experimental feature.
This feature will be used near the future.

3. Run IVRing-reader on a host.
 To share the memory region with IVShmem, s option for indicating shm_obj which
is same as the second step is needed like below:
	./ivring_reader -m 2 -f /tmp/log.txt -S 10 -N 2 -s /ivshmem
Each options are indicated 2nd patch in detail.
Then, IVRing-reader starts to read data from IVRing, but the ring-buffer is
empty yet.
	shared object size: 2097152 (bytes)
	Ring header is already initialized
	reader -1, writer 0, pos 20074a9f
	ivring_init_hdr: 0x7f128417d000
	Receive an interrupt 2
	Try to read buffer.
	Receive an interrupt 2
	no data
	__ivring_read ret=0
	Try to read buffer.
	no data
	__ivring_read ret=0
	Try to read buffer.
	...

4. Start to record logging or tracing data on a guest.
 API for kernel programing is available for IVRing driver:
	ivring_write(int ID, void *buf, size_t size).

It is used for kernel logging as follows:

 	int len;
	char buf[1024];
	len = sprintf(buf, "hogehoge\n",... )
	ivring_write(0, buf, len);

When SystemTap is used as a tracer, a sample script is as follows:

	%{
	extern int ivring_write(int id, void *buf, size_t size);
	%}

	function ivring_print(str:string) %{
		ivring_write(0, THIS->str, strlen(THIS->str));
	%}

	probe kernel.trace("sched*") {
		ivring_print(sprintf("%u: %s(%s)\n", gettimeofday(), pn(), $$parms))
	}
		
The script is executed as
	stap -vg ivring_writer_sample.stp.

 When it is success to record data to IVRing, reader outputs as follows:
	Try to read buffer.
	__ivring_read ret=4096
	__ivring_read ret=4096
	__ivring_read ret=313
	Try to read buffer.
	__ivring_read ret=4096
	__ivring_read ret=4096
	__ivring_read ret=632
	Try to read buffer.

***Future Work***
Features below will be implemented as future work:
 1. To implement a feature of notification from a guest to a host
 2. To implement user I/F on a guest
 3. To be usable in tracing system existing in-kernel
 4. To be usable in SMP environment
    (lockless ring-buffer like ftrace, one ring-buffer one CPU)
 5. To design for Live Migration

Thank you,

---

Yoshihiro YUNOMAE (2):
      ivring: Add a ring-buffer reader tool
      ivring: Add a ring-buffer driver on IVShmem


 drivers/Kconfig               |    1 
 drivers/Makefile              |    1 
 drivers/ivshmem/Kconfig       |    9 +
 drivers/ivshmem/Makefile      |    5 
 drivers/ivshmem/ivring.c      |  551 +++++++++++++++++++++++++++++++++++++++++
 drivers/ivshmem/ivring.h      |   77 ++++++
 tools/Makefile                |    1 
 tools/ivshmem/Makefile        |   19 +
 tools/ivshmem/ivring_reader.c |  516 ++++++++++++++++++++++++++++++++++++++
 tools/ivshmem/ivring_reader.h |   15 +
 tools/ivshmem/pr_msg.c        |  125 +++++++++
 tools/ivshmem/pr_msg.h        |   19 +
 12 files changed, 1339 insertions(+), 0 deletions(-)
 create mode 100644 drivers/ivshmem/Kconfig
 create mode 100644 drivers/ivshmem/Makefile
 create mode 100644 drivers/ivshmem/ivring.c
 create mode 100644 drivers/ivshmem/ivring.h
 create mode 100644 tools/ivshmem/Makefile
 create mode 100644 tools/ivshmem/ivring_reader.c
 create mode 100644 tools/ivshmem/ivring_reader.h
 create mode 100644 tools/ivshmem/pr_msg.c
 create mode 100644 tools/ivshmem/pr_msg.h

-- 
Yoshihiro YUNOMAE
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: yoshihiro.yunomae.ez@...achi.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ