lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Wed, 27 Sep 2023 16:25:28 -0400
From: Matthew Rosato <mjrosato@...ux.ibm.com>
To: Jason Gunthorpe <jgg@...pe.ca>, Niklas Schnelle <schnelle@...ux.ibm.com>
Cc: Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>,
        Wenjia Zhang <wenjia@...ux.ibm.com>,
        Robin Murphy <robin.murphy@....com>, Gerd Bayer <gbayer@...ux.ibm.com>,
        Julian Ruess <julianr@...ux.ibm.com>,
        Pierre Morel <pmorel@...ux.ibm.com>,
        Alexandra Winter
 <wintera@...ux.ibm.com>,
        Heiko Carstens <hca@...ux.ibm.com>, Vasily Gorbik <gor@...ux.ibm.com>,
        Alexander Gordeev
 <agordeev@...ux.ibm.com>,
        Christian Borntraeger <borntraeger@...ux.ibm.com>,
        Sven Schnelle <svens@...ux.ibm.com>,
        Suravee Suthikulpanit <suravee.suthikulpanit@....com>,
        Hector Martin <marcan@...can.st>, Sven Peter <sven@...npeter.dev>,
        Alyssa Rosenzweig <alyssa@...enzweig.io>,
        David Woodhouse <dwmw2@...radead.org>,
        Lu Baolu <baolu.lu@...ux.intel.com>, Andy Gross <agross@...nel.org>,
        Bjorn Andersson <andersson@...nel.org>,
        Konrad Dybcio <konrad.dybcio@...aro.org>,
        Yong Wu <yong.wu@...iatek.com>,
        Matthias Brugger <matthias.bgg@...il.com>,
        AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>,
        Gerald Schaefer <gerald.schaefer@...ux.ibm.com>,
        Orson Zhai <orsonzhai@...il.com>,
        Baolin Wang
 <baolin.wang@...ux.alibaba.com>,
        Chunyan Zhang <zhang.lyra@...il.com>, Chen-Yu Tsai <wens@...e.org>,
        Jernej Skrabec <jernej.skrabec@...il.com>,
        Samuel Holland <samuel@...lland.org>,
        Thierry Reding <thierry.reding@...il.com>,
        Krishna Reddy
 <vdumpa@...dia.com>,
        Jonathan Hunter <jonathanh@...dia.com>,
        Jonathan Corbet <corbet@....net>, linux-s390@...r.kernel.org,
        netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
        iommu@...ts.linux.dev, asahi@...ts.linux.dev,
        linux-arm-kernel@...ts.infradead.org, linux-arm-msm@...r.kernel.org,
        linux-mediatek@...ts.infradead.org, linux-sunxi@...ts.linux.dev,
        linux-tegra@...r.kernel.org, linux-doc@...r.kernel.org
Subject: Re: [PATCH v12 0/6] iommu/dma: s390 DMA API conversion and optimized
 IOTLB flushing

On 9/27/23 11:40 AM, Jason Gunthorpe wrote:
> On Wed, Sep 27, 2023 at 05:24:20PM +0200, Niklas Schnelle wrote:
> 
>> Ok, another update. On trying it out again this problem actually also
>> occurs when applying this v12 on top of v6.6-rc3 too. Also I guess
>> unlike my prior thinking it probably doesn't occur with
>> iommu.forcedac=1 since that still allows IOVAs below 4 GiB and we might
>> be the only ones who don't support those. From my point of view this
>> sounds like a mlx5_core issue they really should call
>> dma_set_mask_and_coherent() before their first call to
>> dma_alloc_coherent() not after. So I guess I'll send a v13 of this
>> series rebased on iommu/core and with an additional mlx5 patch and then
>> let's hope we can get that merged in a way that doesn't leave us with
>> broken ConnectX VFs for too long.
> 
> Yes, OK. It definitely sounds wrong that mlx5 is doing dma allocations before
> setting it's dma_set_mask_and_coherent(). Please link to this thread
> and we can get Leon or Saeed to ack it for Joerg.
> 

Hi Niklas,

I bisected the start of this issue to the following commit (only noticeable on s390 when you apply this subject series on top):

06cd555f73caec515a14d42ef052221fa2587ff9 ("net/mlx5: split mlx5_cmd_init() to probe and reload routines")

Which went in during the merge window.  Please include with your fix and/or report to the mlx5 maintainers.  Looks like the changes in this patch match what you and Jason describe; it splits up mlx5_cmd_init() and moves part of the call earlier.  The net result is we first call mlx5_mdev_init>mlx5_cmd_init->alloc_cmd_page->dma_alloc_coherent and then sometime later call mlx5_pci_init->set_dma_caps->dma_set_mask_and_coherent. 

Prior to this patch, we would not drive mlx5_cmd_init (and thus that first dma_alloc_coherent) until mlx5_init_one which happens _after_ mlx5_pci_init->set_dma_caps->dma_set_mask_and_coherent.

Thanks,
Matt


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ