lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0703081342000.21734@williams.orchestra.cse.unsw.EDU.AU>
Date:	Thu, 8 Mar 2007 14:01:47 +1100 (EST)
From:	Kandan Venkataraman <kven709@....unsw.EDU.AU>
To:	linux-kernel@...r.kernel.org
cc:	akpm@...ux-foundation.org
Subject: [PATCH] Loop device - Tracking page writes made to a loop device
 through mmap


All comments have been taken care of.

Description:

A  file_operations structure variable called loop_fops is initialised with 
the default block device file operations (def_blk_fops).
The mmap operation is overriden with a new function called loop_file_mmap.

A vm_operations structure variable called loop_file_vm_ops is initialised 
with the default operations for a disk file.
The page_mkwrite operation in this variable is initialised to a new 
function called loop_track_pgwrites.

In the function lo_open, the file operations pointer of the device file is 
initialised with the address of loop_fops.

The function loop_file_mmap simply calls generic_file_mmap and then 
initialises the vm_ops of the vma with address of loop_file_vm_ops.

The function loop_track_pgwrites stores the page offset of the page that 
is being written to,  in a red-black tree within the loop device.

A flag lo_track_pgwrite has been added to the structs loop_device and 
loop_info64 to turn on/off tracking of page writes.

Two new ioctls have been added.

The ioctl cmd LOOP_GET_PGWRITES retrieves the page offsets of pages that 
have been written to.
The ioctl cmd LOOP_CLR_PGWRITES empties the red-black tree

This functionality would allow us to have a read only version and a write 
version of memory by doing the following:
Associate a normal file as backing storage for  the loop device and mmap 
to the loop device. Call this mmapped address space as area1.
Mmap to a normal file of identical size. Call this mmapped address space 
as area2.

Changes made to area1 can be periodically copied to area2 using the ioctl 
cmds (retreive dirty page offsets and copy the dirty pages from area1 to 
area2). This facility would provide a quick way of updating the read only 
version.

Motivation for new ioctls:

Imagine a business server application which processes messages from 
clients as they come in (say over a TCP connection).
Some of those messages may be transactions, i.e. they cause data changes 
in the application.
Rest of those messages may be queries i.e. they get information from the 
application.
The application can consist of two processes. One process will handle the 
transactions.
The other process will handle the queries. Each process will have its own 
copy of the business data.
The process handling transactions can mmap to the loop device for its copy 
of the memory. The loop device must have a normal file for its backing 
storage.
The process handling queries can mmap to another normal file for its copy 
of the memory.  Both these memories have identical data at the beginning.
Queries and transactions can now be handled simultaneously by the 
respective processes.
The query process can update its memory periodically by obtaining the 
changes that have have happened to the loop device.
By using the ioctl call to retrieve the dirty page offsets, only the dirty 
pages need to be copied over to the query process's copy of memory. We can 
infact have multiple processes to handle queries sharing the same memory.
During this copy over, the transaction process will hold off processing 
transactions till the update is complete.

This would be very useful for high speed in-memory transaction systems, 
where the query load can be passed of to other processes. Example of such 
systems would be a stock trading system, where clients buy and sell
stock(equity, options etc).
At the same time lot of clients would be downloading market data and this 
can be done independently of the transactions.

This new facility will provide a way of tracking changes made to business 
data, independent of the application domain.


Test program:

Before you run the test program, please create the backing storage file
for the loop device as follows

dd if=/dev/zero of=/root/file bs=4K count=10

Set bs to be whatever pagesize is in your machine. In my machine it was 
4K.


#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#include <sys/wait.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <string.h>
#include <assert.h>
#include <signal.h>
#include <errno.h>
#include <linux/loop.h>

int main()
{
  	int maxPages = 10;
  	char* start = 0;
  	int fd;
  	int dfd;
  	int *array = 0;
  	int pageSize;
  	int elemsPerPage;
  	struct loop_info64 info;
  	struct loop_pgoff_array pgarray;
  	pgarray.max = maxPages;

  	pgarray.pgoff = calloc(maxPages, sizeof(long));

  	if (pgarray.pgoff == NULL) {
  		fprintf(stderr, "can't create pgarray\n");
  		exit(1);
  	}
  	pageSize = getpagesize();

  	elemsPerPage = pageSize/sizeof(int);

  	/* open the device file */
  	if ((fd = open ("/dev/loop0", O_RDWR, S_IRWXU)) < 0) {
  		fprintf(stderr, "can't create device file for writing\n");
  		goto out5;
  	}
  	/* open the disk file  to set as backing storage*/
  	if ((dfd = open ("/root/file", O_RDWR, S_IRWXU)) < 0) {
  		fprintf(stderr, "can't create device file for writing\n");
  		goto out4;
  	}
  	if (ioctl(fd, LOOP_SET_FD, dfd) < 0) {
  		perror("ioctl: LOOP_SET_FD");
  		goto out3;
  	}
  	if ((start = mmap(0, maxPages * pageSize, PROT_READ | PROT_WRITE,
 			  MAP_SHARED, fd, 0)) == MAP_FAILED) {
  		perror("mmap error");
  		goto out2;
  	}
  	if (ioctl(fd, LOOP_GET_STATUS64, &info) < 0) {
  		perror("ioctl: LOOP_CLR_PGWRITES");
  		goto out1;
  	}
  	info.lo_track_pgwrite = 1;

  	if (ioctl(fd, LOOP_SET_STATUS64, &info) < 0) {
  		perror("ioctl: LOOP_SET_STATUS64");
  		goto out1;
  	}
  	if (ioctl(fd, LOOP_CLR_PGWRITES, 0) < 0) {
  		perror("ioctl: LOOP_CLR_PGWRITES");
  		goto out1;
  	}
  	array = (int *)start;

  	array[0] = 5;

  	fprintf(stderr, "value = %d\n", array[0]);

  	array[1] = 9;

  	fprintf(stderr, "value = %d\n", array[1]);

  	array[elemsPerPage] = 14;

  	fprintf(stderr, "value = %d\n", array[elemsPerPage]);

  	array[3*elemsPerPage+60] = 35;

  	fprintf(stderr, "value = %d\n", array[3*elemsPerPage+60]);

  	if (ioctl(fd, LOOP_GET_PGWRITES, &pgarray) < 0) {
  		perror("ioctl: LOOP_GET_PGWRITES");
  		goto out1;
  	}
  	int i;
  	for (i= 0; i < pgarray.num; i++)
  		fprintf(stderr, "offset %ld\n", pgarray.pgoff[i]);

out1:
  	munmap(start, maxPages * pageSize);
out2:
  	ioctl(fd, LOOP_CLR_FD, 0);
out3:
  	close(dfd);
out4:
  	close(fd);
out5:
  	return 0;
}




Signed-off-by: Kandan Venkataraman kandan.venkataraman@...group.com



View attachment "patch" of type "TEXT/PLAIN" (10588 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ