lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b283e3ba-6e75-bf86-a29f-100200df9f77@wwwdotorg.org>
Date:   Wed, 2 Jan 2019 09:29:30 -0700
From:   Stephen Warren <swarren@...dotorg.org>
To:     Tariq Toukan <tariqt@...lanox.com>
Cc:     Jason Gunthorpe <jgg@...pe.ca>, xavier.huwei@...wei.com,
        netdev@...r.kernel.org, linux-rdma@...r.kernel.org,
        Doug Ledford <dledford@...hat.com>,
        Christoph Hellwig <hch@....de>,
        Stephen Warren <swarren@...dia.com>
Subject: Re: [PATCH V3] net/mlx4: Get rid of page operation after
 dma_alloc_coherent

On 12/20/18 10:43 AM, Jason Gunthorpe wrote:
> On Wed, Dec 19, 2018 at 11:20:31AM -0700, Stephen Warren wrote:
>> From: Stephen Warren <swarren@...dia.com>
>>
>> This patch solves a crash at the time of mlx4 driver unload or system
>> shutdown. The crash occurs because dma_alloc_coherent() returns one
>> value in mlx4_alloc_icm_coherent(), but a different value is passed to
>> dma_free_coherent() in mlx4_free_icm_coherent(). In turn this is because
>> when allocated, that pointer is passed to sg_set_buf() to record it,
>> then when freed it is re-calculated by calling
>> lowmem_page_address(sg_page()) which returns a different value. Solve
>> this by recording the value that dma_alloc_coherent() returns, and
>> passing this to dma_free_coherent().
>>
>> This patch is roughly equivalent to commit 378efe798ecf ("RDMA/hns: Get
>> rid of page operation after dma_alloc_coherent"). That patch was
>> described as:
>>
>>> In general, dma_alloc_coherent() returns a CPU virtual address and
>>> a DMA address, and we have no guarantee that the underlying memory
>>> even has an associated struct page at all.
>>>
>>> This patch gets rid of the page operation after dma_alloc_coherent,
>>> and records the VA returned form dma_alloc_coherent in the struct
>>> of hem in hns RoCE driver.
>>
>> However, this patch reworks the code to store all information about
>> coherent chunks separately from the sg list, since using sg lists for
>> them doesn't make sense. Hence, the structure of this patch is quite
>> different compared to the hns patch.
>>
>> Based-on-code-from: Christoph Hellwig <hch@....de>
>> Signed-off-by: Stephen Warren <swarren@...dia.com>
>> ---
>> v3:
>> - Rework chunk data structure to store all data for coherent allocations
>>    separately from the sg list. Code from Christoph Hellwig with fixes by
>>    me. Notes:
>>    - chunk->coherent is an int not a bool since checkpatch complains about
>>      using bool in structs; see https://lkml.org/lkml/2017/11/21/384.
> 
> :( bool is much more readable and there is no performance concern in
> this struct. I think checkpatch is overzealous here.
> 
>>    - chunk->coherent is used rather than chunk->table->coherent since the
>>      table pointer isn't available when creating chunks. This duplicates
>>      data, but simplifies the patch.
>> v2:
>> - Rework mlx4_table_find() to explicitly calculate the returned address
>>    differently depending on wheter the table was allocated using
>>    dma_alloc_coherent() or alloc_pages(), which in turn allows the
>>    changes to mlx4_alloc_icm_pages() to be dropped.
>> - Drop changes to mlx4_alloc/free_icm_pages. This path uses
>>    pci_map_sg() which can re-write the sg list which in turn would cause
>>    chunk->mem[] (the sg list) and chunk->buf[] to become inconsistent.
>> - Enhance commit description.
>>
>> Note: I've tested this patch in a downstream 4.14 based kernel (using
>> ibping, ib_read_bw, and ib_write_bw), but can't test it in mainline
>> since my system isn't supported there yet. I have compile-tested it in
>> mainline at least, for ARM64.
>> ---
>>   drivers/net/ethernet/mellanox/mlx4/icm.c | 92 ++++++++++++++----------
>>   drivers/net/ethernet/mellanox/mlx4/icm.h | 22 +++++-
>>   2 files changed, 75 insertions(+), 39 deletions(-)
> 
> I think this needs an ack from the driver maintainers, and I gather
> they are all on break for the next two weeks or so.
> 
> Lets revisit in January?
> 
> But at first glance it looks OK to me, though I would tidy the commit
> message somewhat, your new leading paragraph fully explains the
> problem, no need for the hns quote - and we can now see that the hns
> should't have used a sgl either...

Tariq, I assume you're back from vacation now/soon. What do you think of 
this patch, aside from the bool-vs-int issue that I plan to fix up soon. 
Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ