lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210414144452.GC1370958@nvidia.com>
Date:   Wed, 14 Apr 2021 11:44:52 -0300
From:   Jason Gunthorpe <jgg@...dia.com>
To:     Tom Talpey <tom@...pey.com>
Cc:     Haakon Bugge <haakon.bugge@...cle.com>,
        David Laight <David.Laight@...lab.com>,
        Chuck Lever III <chuck.lever@...cle.com>,
        Christoph Hellwig <hch@....de>,
        Leon Romanovsky <leon@...nel.org>,
        Doug Ledford <dledford@...hat.com>,
        Leon Romanovsky <leonro@...dia.com>,
        Adit Ranadive <aditr@...are.com>,
        Anna Schumaker <anna.schumaker@...app.com>,
        Ariel Elior <aelior@...vell.com>,
        Avihai Horon <avihaih@...dia.com>,
        Bart Van Assche <bvanassche@....org>,
        Bernard Metzler <bmt@...ich.ibm.com>,
        "David S. Miller" <davem@...emloft.net>,
        Dennis Dalessandro <dennis.dalessandro@...nelisnetworks.com>,
        Devesh Sharma <devesh.sharma@...adcom.com>,
        Faisal Latif <faisal.latif@...el.com>,
        Jack Wang <jinpu.wang@...os.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Bruce Fields <bfields@...ldses.org>, Jens Axboe <axboe@...com>,
        Karsten Graul <kgraul@...ux.ibm.com>,
        Keith Busch <kbusch@...nel.org>, Lijun Ou <oulijun@...wei.com>,
        CIFS <linux-cifs@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux NFS Mailing List <linux-nfs@...r.kernel.org>,
        "linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>,
        OFED mailing list <linux-rdma@...r.kernel.org>,
        "linux-s390@...r.kernel.org" <linux-s390@...r.kernel.org>,
        Max Gurtovoy <maxg@...lanox.com>,
        Max Gurtovoy <mgurtovoy@...dia.com>,
        "Md. Haris Iqbal" <haris.iqbal@...os.com>,
        Michael Guralnik <michaelgur@...dia.com>,
        Michal Kalderon <mkalderon@...vell.com>,
        Mike Marciniszyn <mike.marciniszyn@...nelisnetworks.com>,
        Naresh Kumar PBS <nareshkumar.pbs@...adcom.com>,
        Linux-Net <netdev@...r.kernel.org>,
        Potnuri Bharat Teja <bharat@...lsio.com>,
        "rds-devel@....oracle.com" <rds-devel@....oracle.com>,
        Sagi Grimberg <sagi@...mberg.me>,
        "samba-technical@...ts.samba.org" <samba-technical@...ts.samba.org>,
        Santosh Shilimkar <santosh.shilimkar@...cle.com>,
        Selvin Xavier <selvin.xavier@...adcom.com>,
        Shiraz Saleem <shiraz.saleem@...el.com>,
        Somnath Kotur <somnath.kotur@...adcom.com>,
        Sriharsha Basavapatna <sriharsha.basavapatna@...adcom.com>,
        Steve French <sfrench@...ba.org>,
        Trond Myklebust <trond.myklebust@...merspace.com>,
        VMware PV-Drivers <pv-drivers@...are.com>,
        Weihang Li <liweihang@...wei.com>,
        Yishai Hadas <yishaih@...dia.com>,
        Zhu Yanjun <zyjzyj2000@...il.com>
Subject: Re: [PATCH rdma-next 00/10] Enable relaxed ordering for ULPs

On Wed, Apr 14, 2021 at 10:16:28AM -0400, Tom Talpey wrote:
> On 4/12/2021 6:48 PM, Jason Gunthorpe wrote:
> > On Mon, Apr 12, 2021 at 04:20:47PM -0400, Tom Talpey wrote:
> > 
> > > So the issue is only in testing all the providers and platforms,
> > > to be sure this new behavior isn't tickling anything that went
> > > unnoticed all along, because no RDMA provider ever issued RO.
> > 
> > The mlx5 ethernet driver has run in RO mode for a long time, and it
> > operates in basically the same way as RDMA. The issues with Haswell
> > have been worked out there already.
> > 
> > The only open question is if the ULPs have errors in their
> > implementation, which I don't think we can find out until we apply
> > this series and people start running their tests aggressively.
> 
> I agree that the core RO support should go in. But turning it on
> by default for a ULP should be the decision of each ULP maintainer.
> It's a huge risk to shift all the storage drivers overnight. How
> do you propose to ensure the aggressive testing happens?

Realistically we do test most of the RDMA storage ULPs at NVIDIA over
mlx5 which is the only HW that will enable this for now.

I disagree it is a "huge risk".

Additional wider testing is welcomed and can happen over the 16 week
release cycle for a kernel. I would aim to get the relaxed ordering
changed merged to linux-next a week or so after the merge window.

Further testing happens before these changes would get picked up in a
distro on something like MLNX_OFED.

I don't think we need to make the patch design worse or over think the
submission process for something that, so far, hasn't discovered any
issues and alread has a proven track record in other ULPs.

Any storage ULP that has a problem here is mis-using verbs and the DMA
API and thus has an existing data-corruption bug that they are simply
lucky to have not yet discovered.

> One thing that worries me is the patch02 on-by-default for the dma_lkey.
> There's no way for a ULP to prevent IB_ACCESS_RELAXED_ORDERING
> from being set in __ib_alloc_pd().

The ULPs are being forced into relaxed_ordering. They don't get to
turn it off one by one. The v2 will be more explicit about this as
there will be no ULP patches, just the verbs core code being updated.

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ