linux-kernel - Re: [PCI Driver]Physical address being returned as 0 in mmap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABBBjTJkisyWsRBn28Lj5vCne5qbKP0qSVvM0ytXF_Y8O29_3A@mail.gmail.com>
Date:	Sun, 3 Jul 2011 03:02:07 +0200
From:	newton mailinglist <newtonmailinglist@...il.com>
To:	Kyle Moffett <kyle@...fetthome.net>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: [PCI Driver]Physical address being returned as 0 in mmap

ok yeah I think I am getting the hang of it now.

Well let me explain some more about what I am doing . This is going to
be a bit long as I need to explain my work a bit to explain what the
issue I am facing is.  My research involves accelerating software
functions of a c program, in hardware(thus the fpga). Our platform is
unique because it allows the fpga to have direct access to the main
memory of its host computer.

The way I accelerate a function is I make the user mark it using a
#pragma. Then using a compiler pass I remove this function's code and
replace it by code to send the function parameters to the fpga,
execute it there and read the values back.

Say we have a program like the following :
__a t t r i b u t e__  ( ( us e r ( ” r e p l a c e ” ) ) )
int foo ( int a ){ int b ;
  . . .
  . . .
  return b ;
}
int main ( void){
  return foo ( 0 ) ;
}

Then when the program is compiled the function body is replaced by the
following code :

int foo ( int a ){
   int b ;
   htex_write ( a , 0 ) ;
   htex_execute ( ) ;
   b = htex_read ( 1 ) ;
   return b ;
}

int main(void){
  htex_open();
  return foo(0);
}

The htex_write() invokes the device driver write function(you can see
molen_htex.c if you are interested, its attached). So this causes the
a parameter to be written to a memory mapped I/O region from where the
parameter goes to the fpga.  The htex_execute() causes a IOCTL call
which actually sets a control register in the FPGA to signal it to
start executing(again using memory mapped I/O). At this point the
process it put to sleep and thus the c program is stalled at the
htex_execute() till woken up.

The htex_*() functions are just wrappers for invoking device driver
operations.(the device driver file is also attached). Once done, the
FPGA interrupts the OS and the driver handles the interrupt. It wakes
up the sleeping process. Then htex_read() reads back the parameter and
puts it in the variable b. The software function can now return. The
reads and write to the fpga actually causes data to be written to and
read from BRAM inside the fpga(we call it exchange registers, but that
is just mundane hardware details)

program -> wrapper functions -> device driver -> fpga

So as you can see , I have quite a bit of working code setup to read
and write to the fpga from software(compiler passes , device driver
code etc).
Recently I have been trying to send some configuration data to the fpga.

This involves calling the htex_set() function defined in molen_htx.c
The function opens the fpga configuration file(the bitstream) and
memory maps it in the fpga. It then sends the address of the memory
mapped region to the fpga via an ioctl call(ioctl calls are handled in
htex_ioctl() in the driver).

Once the fpga receives this address it sends an interrupt to the OS to
translate the address to a physical address which the fpga can then
send to the dma unit. This virtual to physical address translation is
done in the driver. So this address translation interrupt is different
from an execution completion interrupt and there is code in place in
the interrupt handler to distinguish these 2 types of interrupts
properly.  The actual address translation happens in
translate_address() defined in the device driver file. The address
translation involves pinning the requested user page to the memory
using get_user_pages() and then getting the physical address using
pci_map_page(). This works because the calling process is asleep when
this happens.

Note that even while executing a C function which is accessing a large
amount of data from memory, the FPGA only gets the virtual address of
the array(in the exchange registers) and it requests a translation to
the physical address, which is sent to the dma unit. So I decided to
use the same process for sending the configuration data as well.

The entire process works at the moment but the only issue is the C
program is stalled when configuration data is being sent and I dont
want that.

So to improve the system what I did was not put the process to sleep
but to let it continue. Thus i want to  continue receiving address
translation interrupts from the fpga and then send physical addresses
to it while the C program runs.

But the address translation immediately failed when I tried that. I
thought it was due to trying to access the pages of a running process
and pin them using get_user_pages() as the driver fails at this exact
call.
(you can see the translate_address() function in the driver file to
see what I mean).

All I need to do is send about 1.1 mb of data to the device and I dont
really need the user process to synchronize because I want to copy all
the data to a buffer from where the device will read it. Its not
streaming data. Its a one time only send.

Ok, so to finally come to the point I decided to implement the mmap()
function call in the driver and copy the user data to kernel space.
The I would let the process run. Doing this allows me to translate
addresses which are in kernel space to physical addresses without
bothering to pin user pages to memory which wil cause problems if the
calling process is running at the time.

This is how I modified the htex_set(), to copy all the data into
kernel space(I copy all the data using memcpy(), so you can see why I
was talking about memcpy() before) :

void molen_set(const char* filename)
{
	int fd = open(filename, O_RDWR);
	int length = 0;
	char *arg, *bs_map;
	struct stat buf;

	if (fd < 0)
	{
		perror("molen_set : Filename Open");
		return;
	}
	fstat(fd, &buf);
	length = buf.st_size;
	if(length == 0)
	{
		fprintf(stderr, "\nmolen_set : Empty input file\n");
		return;
	}
	
	//Map bitstream file to memory to read it in via page faults(faster
than explicit file I/O)
	bs_map = mmap(NULL, length, PROT_WRITE | PROT_READ, MAP_PRIVATE, fd, 0);
	printf("molen_set : Bitstream mapped to = %p\n", bs_map);
	printf("molen_set : Bistream size = %d\n", length);
	
	//Get user space access to empty kernel memory of length bytes
	arg = mmap(NULL, length, PROT_WRITE | PROT_READ, MAP_PRIVATE, htex_handle, 0);
	printf("molen_set : Kernel buffer user space address = %p\n", arg);
	
	//Copy bs data to kernel space buffer : this is when the page faults
actually occur ?
	memset(arg, 0, length);
	memcpy(arg, bs_map, length);


	printf("molen_set : ioctl called\n");
	ioctl(htex_handle, HTEX_IOSET, arg);
	printf("molen_set : ioctl returned\n");

	munmap(arg, length);
	munmap(bs_map, length);
}

The mmap() call to the device driver lands in the function htex_mmap()
defined in htex_driver.c which is where I want to do the dma setup as
I am already using htex_read() and htex_write() for function
parameters are explained before.

So from our dicussion so far I think what might work is if I read in
the user file and copy it to a DMA buffer and then send the address of
this buffer to the fpga. To copy the data I was using memcpy() inside
the modified htex_set()...see above. I can perhaps continue to copy
the user data to the dma buffer using memcpy().

Hopefully once copied, I can let the C program proceed, and the DMA
api will give me a physical address which I can send to the fpga to
read the buffer from memory.

Thanks,
Abhi


On Sat, Jul 2, 2011 at 6:24 PM, Kyle Moffett <kyle@...fetthome.net> wrote:
> Please reply inline, top-posting is unwelcome on LKML.
>
> On Sat, Jul 2, 2011 at 11:27, newton mailinglist
> <newtonmailinglist@...il.com> wrote:
>> ok thanks Kyle, I will try the DMA API now.
>>
>>  I was wondering, when my driver gets the mmap() call then has the
>>  kernel already allocated memory and put its details in the vma
>>  parameter or do I need to allocate memory myself ?
>
> Hmm, I'm not exactly clear what you are asking here... I would need
> to see your driver code to be able to give you better advice, but let
> me give it a shot anyways:
>
> The "vma" is a vm_area_struct which describes the user side of the
> memory-map, and in your "mmap()" call you essentially populate that
> "vma" with information about the physical CPU memory addresses.
> Typically that means memory allocated with dma_alloc_coherent(),
> and I think the DMA API docs have some example code for this.
>
>
>> I am already using the driver read and write functions to read/write
>> some other parameters to the FPGA, so I will setup the DMA in mmap()
>
> The reason I suggest read()/write() is because you are talking about
> using mmap() and then doing memcpy(), whereas if you implement
> your read()/write() handlers properly then you might not need to do
> a memcpy() at all.
>
> If you're just using read()/write() for parameter data, I would switch
> those to an ioctl() call and use read()/write() for your bulk data transfer.
> Even if you elect to use mmap() the way you have described, you
> will still need to implement some ioctl() calls to allow your userspace
> program to trigger the appropriate dma_sync_*() API calls on the
> memory when communicating with the hardware.
>
> Ideally, for streaming large amounts of data out of a device, you
> would call read() from the device onto a large multiple-page buffer
> in your userspace program.  The userspace addresses get passed
> down into your kernel driver, and you use the appropriate APIs to
> get the page structs for those addresses and directly DMA-map
> those pages.
>
> That means that in the best case your device will read directly into
> user memory.  If the user memory is not accessible to your device
> then the kernel will use the "swiotlb" driver to bounce-buffer it for
> you.
>
> Cheers,
> Kyle Moffett
>

View attachment "molen_htx.c" of type "text/x-csrc" (2598 bytes)

View attachment "htex_driver.c" of type "text/x-csrc" (25422 bytes)