lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <50bd6e83-17fb-4e8c-f8b1-e28c98f4f758@kalray.eu>
Date:	Tue, 26 Apr 2016 15:31:17 +0200
From:	Nicolas Morey-Chaisemartin <nmorey@...ray.eu>
To:	linux-kernel@...r.kernel.org
Subject: Re: [Question] Missing data after DMA read transfer

PIng. I could really use some help/feedback on this.

Thanks in advance


Nicolas


Le 04/25/2016 à 08:18 AM, Nicolas Morey-Chaisemartin a écrit :
> Le 04/20/2016 à 04:56 PM, Nicolas Morey-Chaisemartin a écrit :
>> Hi everyone,
>>
>> Short version:
>> I'm having an issue with direct DMA transfer from a device to host memory.
>> It seems some of the data is not transferring to the appropriate page.
>>
>> Some more details:
>> I'm debugging a home made PCI driver for our board (Kalray), attached to a x86_64 host running centos7 (3.10.0-327.el7.x86_64)
>>
>> In the current case, a userland application transfers back and forth data through read/write operations on a file.
>> On the kernel side, it triggers DMA transfers through the PCI to/from our board memory.
>>
>> We followed what pretty much all docs said about direct I/O to user buffers:
>>
>> 1) get_user_pages() (in the current case, it's at most 16 pages at once)
>> 2) convert to a scatterlist
>> 3) pci_map_sg
>> 4) eventually coalesce sg (Intel IOMMU is enabled, so it's usually possible)
>> 4) A lot of DMA engine handling code, using the dmaengine layer and virt-dma
>> 5) wait for transfer complete, in the mean time, go back to (1) to schedule more work, if any
>> 6) pci_unmap_sg
>> 7) for read (card2host) transfer, set_page_dirty_lock
>> 8) page_cache_release
>>
>> In 99,9999% it works perfectly.
>> However, I have one userland application where a few pages are not written by a read (card2host) transfer.
>> The buffer is memset them to a different value so I can check that nothing has overwritten them.
>>
>> I know (PCI protocol analyser) that the data left our board for the "right" address (the one set in the sg by pci_map_sg).
>> I tried reading the data between the pci_unmap_sg and the set_page_dirty, using
>>         uint32_t *addr = page_address(trans->pages[0]);
>>         dev_warn(&pdata->pdev->dev, "val = %x\n", *addr);
>> and it has the expected value.
>> But if I try to copy_from_user (using the address coming from userland, the one passed to get_user_pages), the data has not been written and I see the memset value.
>>
>> I manage to build a test case that fails all the time, but never at the same offset within the buffer.
>> It's always in the middle (never at the start nor end), for a few pages long (varies between runs).
>>
>>
>> Am I missing something? Could it be possible that I'm not writing to the right page?
>> If you need more information, feel free to ask
>>
>>
>> Thanks in advance
>>
>> Nicolas
>>
> As suggested, I tried to run the app without IOMMU and with DMA_API_DEBUG enabled.
>
> intel_iommu=off changed nothing and the app still fails
> DMA_API_DEBUG showed no warning or error.
>
> I'm open to other tests that could add useful information for debugging this.
>
>
> I also tried something. I'm not sure what exactly I am looking at but it looks suspicious to me:
>
> When running with intel_iommu=on, I retrieved the page pointer corresponding to the user virtual address by looking at the MM/VMA structs
> and compare it to the on I got earlier from get_user_pages.
> It appears that regularly these pointers do not match. And for the pages which are "not transfered", they never do.
>
> If this is to be expected, why are the pages different? The buffer were memset before the call to the PCI driver so all the phy page should be resolved (no COW or things like this) and I thought the point of get_user_pages was to pin pages so they cannot be moved/swapped until they are put back?
>
>
> Nicols

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ