lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230515104847.6dfdf31b@meshulam.tesarici.cz>
Date:   Mon, 15 May 2023 10:48:47 +0200
From:   Petr Tesařík <petr@...arici.cz>
To:     Catalin Marinas <catalin.marinas@....com>
Cc:     Petr Tesarik <petrtesarik@...weicloud.com>,
        Jonathan Corbet <corbet@....net>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
        Maxime Ripard <mripard@...nel.org>,
        Thomas Zimmermann <tzimmermann@...e.de>,
        David Airlie <airlied@...il.com>,
        Daniel Vetter <daniel@...ll.ch>,
        Christoph Hellwig <hch@....de>,
        Marek Szyprowski <m.szyprowski@...sung.com>,
        Robin Murphy <robin.murphy@....com>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Borislav Petkov <bp@...e.de>,
        Randy Dunlap <rdunlap@...radead.org>,
        Damien Le Moal <damien.lemoal@...nsource.wdc.com>,
        Kim Phillips <kim.phillips@....com>,
        "Steven Rostedt (Google)" <rostedt@...dmis.org>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Hans de Goede <hdegoede@...hat.com>,
        Jason Gunthorpe <jgg@...pe.ca>,
        Kees Cook <keescook@...omium.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
        open list <linux-kernel@...r.kernel.org>,
        "open list:DRM DRIVERS" <dri-devel@...ts.freedesktop.org>,
        "open list:DMA MAPPING HELPERS" <iommu@...ts.linux.dev>,
        Roberto Sassu <roberto.sassu@...wei.com>,
        Kefeng Wang <wangkefeng.wang@...wei.com>
Subject: Re: [PATCH v2 RESEND 7/7] swiotlb: per-device flag if there are
 dynamically allocated buffers

Hi Catalin,

On Sun, 14 May 2023 19:54:27 +0100
Catalin Marinas <catalin.marinas@....com> wrote:

> On Tue, May 09, 2023 at 11:18:19AM +0200, Petr Tesarik wrote:
> > diff --git a/include/linux/device.h b/include/linux/device.h
> > index d1d2b8557b30..e340e0f06dce 100644
> > --- a/include/linux/device.h
> > +++ b/include/linux/device.h
> > @@ -516,6 +516,9 @@ struct device_physical_location {
> >   * @dma_io_tlb_dyn_slots:
> >   *		Dynamically allocated bounce buffers for this device.
> >   *		Not for driver use.
> > + * @dma_io_tlb_have_dyn:
> > + *		Does this device have any dynamically allocated bounce
> > + *		buffers? Not for driver use.
> >   * @archdata:	For arch-specific additions.
> >   * @of_node:	Associated device tree node.
> >   * @fwnode:	Associated device node supplied by platform firmware.
> > @@ -623,6 +626,7 @@ struct device {
> >  	struct io_tlb_mem *dma_io_tlb_mem;
> >  	spinlock_t dma_io_tlb_dyn_lock;
> >  	struct list_head dma_io_tlb_dyn_slots;
> > +	bool dma_io_tlb_have_dyn;
> >  #endif
> >  	/* arch specific additions */
> >  	struct dev_archdata	archdata;
> > diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> > index daa2064f2ede..8cbb0bebb0bc 100644
> > --- a/include/linux/swiotlb.h
> > +++ b/include/linux/swiotlb.h
> > @@ -152,7 +152,11 @@ static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
> >  
> >  	return mem &&
> >  		(is_swiotlb_fixed(mem, paddr) ||
> > -		 (mem->allow_dyn && is_swiotlb_dyn(dev, paddr)));
> > +		 /* Pairs with smp_store_release() in swiotlb_dyn_map()
> > +		  * and swiotlb_dyn_unmap().
> > +		  */
> > +		 (smp_load_acquire(&dev->dma_io_tlb_have_dyn) &&
> > +		  is_swiotlb_dyn(dev, paddr)));
> >  }
> >  
> >  static inline bool is_swiotlb_force_bounce(struct device *dev)
> > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > index 81eab1c72c50..e8be3ee50f18 100644
> > --- a/kernel/dma/swiotlb.c
> > +++ b/kernel/dma/swiotlb.c
> > @@ -642,6 +642,9 @@ static phys_addr_t swiotlb_dyn_map(struct device *dev, phys_addr_t orig_addr,
> >  
> >  	spin_lock_irqsave(&dev->dma_io_tlb_dyn_lock, flags);
> >  	list_add(&slot->node, &dev->dma_io_tlb_dyn_slots);
> > +	if (!dev->dma_io_tlb_have_dyn)
> > +		/* Pairs with smp_load_acquire() in is_swiotlb_buffer() */
> > +		smp_store_release(&dev->dma_io_tlb_have_dyn, true);
> >  	spin_unlock_irqrestore(&dev->dma_io_tlb_dyn_lock, flags);  
> 
> I'm not sure this works. What this seems to do is that if the caller of
> is_swiotlb_buffer() sees the flag set, it's guaranteed that something
> was added to the dma_io_tlb_dyn_slots list. But the reverse is not
> necessarily true. IOW, if something was added to the list, there is a
> brief window where the dma_io_tlb_have_dyn flag is still false. In the
> general case, I doubt any ordering between list_add() and the flag
> setting changes anything since neither of them may be visible to another
> CPU.

Thank you for the review! This patch probably needs a bit more
explanation.

The goal is to avoid taking a spin lock in the mkost common case that
the dynamic list is empty. The only required invariant is:

  When the flag is clear, it is safe to skip the list.

It's not a bug to walk an empty list, it's merely less efficient. Such
race window would be acceptable. OTOH that's not your concern if I
understand you correctly.

> What you need is for a 'paddr' added to the dynamic list to be correctly
> identified by another CPU as dynamic swiotlb. That other CPU doesn't
> check random addresses but only those returned by the DMA API.

Yes, that's correct.

> Such
> values would be normally passed through a memory location (e.g. driver
> local structures) and that's what you want to order against.

This would certainly work, but I'm not sure I need it. My only goal is
that when the flag is set, the new value is observed by all CPUs on the
next call to is_swiotlb_buffer().

> What I mean is that a 'dev->blah = paddr' needs to be ordered _after_
> your flag setting. So you need the either the 'blah = paddr' assignment
> to have release semantics or the flag setting to be an
> smp_store_acquire() (but we don't have such thing). You'd have to use an
> explicit smp_wmb() barrier after the flag setting (it can be outside the
> lock). The spin_unlock() is not sufficient since it only has release
> semantics.

Understood. The spinlock is supposed to order changes to the list, not
to the flag.

> I also don't think the ordering between list_add() and flag
> setting matters since the smp_wmb() would ensure that both are visible
> when the 'paddr' value made it to the other CPU.

If the flag makes it before the list, then the other CPU will walk the
list only after acquiring dma_io_tlb_dyn_lock, and that's what the list
is ordered against.

I don't think there is any concern if the list makes it before the
flag, as long as the new value of the flag is observed on the next call
to is_swiotlb_buffer (on any CPU).

> Similarly on the is_swiotlb_buffer() side, you want the flag reading to
> be ordered after the 'blah = paddr' is observed. Here the
> smp_load_acquire() is sufficient.
> 
> >  	return page_to_phys(slot->page);
> > @@ -668,6 +671,9 @@ static void swiotlb_dyn_unmap(struct device *dev, phys_addr_t tlb_addr,
> >  	unsigned long flags;
> >  
> >  	spin_lock_irqsave(&dev->dma_io_tlb_dyn_lock, flags);
> > +	if (list_is_singular(&dev->dma_io_tlb_dyn_slots))
> > +		/* Pairs with smp_load_acquire() in is_swiotlb_buffer() */
> > +		smp_store_release(&dev->dma_io_tlb_have_dyn, false);
> >  	slot = lookup_dyn_slot_locked(dev, tlb_addr);
> >  	list_del(&slot->node);
> >  	spin_unlock_irqrestore(&dev->dma_io_tlb_dyn_lock, flags);  
> 
> As with the map case, I don't think the ordering between list_del() and
> the flag setting matters. If you unmap the last dynamic buffer, the
> worst that can happen is that an is_swiotlb_buffer() call attempts a
> read of the list but the flag will eventually become visible. There
> shouldn't be another caller trying to unmap the same paddr (in well
> behaved drivers).
> 
> Now, thinking about the list_head access and the flag ordering, since it
> doesn't matter, you might as well not bother with the flag at all and
> rely on list_add() and list_empty() ordering vs the hypothetical 'blah'
> access. Both of these use READ/WRITE_ONCE() for setting
> dma_io_tlb_dyn_slots.next. You only need an smp_wmb() after the
> list_add() and an smp_rmb() before a list_empty() check in
> is_swiotlb_buffer(), no dma_iotlb_have_dyn variable.

Wait, let me check that I understand you right. Do you suggest that I
convert dma_io_tlb_dyn_slots to a lockless list and get rid of the
spinlock?

I'm sure it can be done for list_add() and list_del(). I'll have
to think about list_move().

> That's my reasoning but to I'll have be absolutely sure, you can pass that through
> some formal modelling.

Good idea. Especially if I try to get rid of the lock.

Petr T

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ