[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7d1ce1b5-edba-b017-3131-37405f1b0c24@caviumnetworks.com>
Date: Wed, 6 Dec 2017 14:51:41 +0530
From: George Cherian <gcherian@...iumnetworks.com>
To: "Michael S. Tsirkin" <mst@...hat.com>, linux-kernel@...r.kernel.org
Cc: George Cherian <george.cherian@...ium.com>,
Jason Wang <jasowang@...hat.com>, davem@...emloft.net,
edumazet@...gle.com, netdev@...r.kernel.org,
virtualization@...ts.linux-foundation.org
Subject: Re: [PATCH] ptr_ring: add barriers
Hi Michael,
On 12/06/2017 12:59 AM, Michael S. Tsirkin wrote:
> Users of ptr_ring expect that it's safe to give the
> data structure a pointer and have it be available
> to consumers, but that actually requires an smb_wmb
> or a stronger barrier.
This is not the exact situation we are seeing.
Let me try to explain the situation
Affected on ARM64 platform.
1) tun_net_xmit calls skb_array_produce, which pushes the skb to the
ptr_ring, this push is protected by a producer_lock.
2)Prior to this call the tun_net_xmit calls skb_orphan which calls the
skb->destructor and sets skb->destructor and skb->sk as NULL.
2.a) These 2 writes are getting reordered
3) At the same time in the receive side (tun_ring_recv), which gets
executed in another core calls skb_array_consume which pulls the skb
from ptr ring, this pull is protected by a consumer lock.
4) eventually calling the skb->destructor (sock_wfree) with stale values.
Also note that this issue is reproducible in a long run and doesn't
happen immediately after the launch of multiple VM's (infact the
particular test cases launches 56 VM's which does iperf back and forth)
>
> In absence of such barriers and on architectures that reorder writes,
> consumer might read an un=initialized value from an skb pointer stored
> in the skb array. This was observed causing crashes.
>
> To fix, add memory barriers. The barrier we use is a wmb, the
> assumption being that producers do not need to read the value so we do
> not need to order these reads.
It is not the case that producer is reading the value, but the consumer
reading stale value. So we need to have a strict rmb in place .
>
> Reported-by: George Cherian <george.cherian@...ium.com>
> Suggested-by: Jason Wang <jasowang@...hat.com>
> Signed-off-by: Michael S. Tsirkin <mst@...hat.com>
> ---
>
> George, could you pls report whether this patch fixes
> the issue for you?
>
> This seems to be needed in stable as well.
>
>
>
>
> include/linux/ptr_ring.h | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> index 37b4bb2..6866df4 100644
> --- a/include/linux/ptr_ring.h
> +++ b/include/linux/ptr_ring.h
> @@ -101,12 +101,18 @@ static inline bool ptr_ring_full_bh(struct ptr_ring *r)
>
> /* Note: callers invoking this in a loop must use a compiler barrier,
> * for example cpu_relax(). Callers must hold producer_lock.
> + * Callers are responsible for making sure pointer that is being queued
> + * points to a valid data.
> */
> static inline int __ptr_ring_produce(struct ptr_ring *r, void *ptr)
> {
> if (unlikely(!r->size) || r->queue[r->producer])
> return -ENOSPC;
>
> + /* Make sure the pointer we are storing points to a valid data. */
> + /* Pairs with smp_read_barrier_depends in __ptr_ring_consume. */
> + smp_wmb();
> +
> r->queue[r->producer++] = ptr;
> if (unlikely(r->producer >= r->size))
> r->producer = 0;
> @@ -275,6 +281,9 @@ static inline void *__ptr_ring_consume(struct ptr_ring *r)
> if (ptr)
> __ptr_ring_discard_one(r);
>
> + /* Make sure anyone accessing data through the pointer is up to date. */
> + /* Pairs with smp_wmb in __ptr_ring_produce. */
> + smp_read_barrier_depends();
> return ptr;
> }
>
>
Powered by blists - more mailing lists