[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1475157674.4676.52.camel@redhat.com>
Date: Thu, 29 Sep 2016 16:01:14 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
James Morris <jmorris@...ei.org>,
Trond Myklebust <trond.myklebust@...marydata.com>,
Alexander Duyck <aduyck@...antis.com>,
Daniel Borkmann <daniel@...earbox.net>,
Eric Dumazet <edumazet@...gle.com>,
Tom Herbert <tom@...bertland.com>,
Hannes Frederic Sowa <hannes@...essinduktion.org>,
Edward Cree <ecree@...arflare.com>, linux-nfs@...r.kernel.org
Subject: Re: [PATCH net-next v3 2/3] udp: implement memory accounting helpers
On Thu, 2016-09-29 at 06:24 -0700, Eric Dumazet wrote:
> On Thu, 2016-09-29 at 11:31 +0200, Paolo Abeni wrote:
> > On Wed, 2016-09-28 at 18:42 -0700, Eric Dumazet wrote:
> > > On Wed, 2016-09-28 at 12:52 +0200, Paolo Abeni wrote:
> > >
> > > > +static void udp_rmem_release(struct sock *sk, int partial)
> > > > +{
> > > > + struct udp_sock *up = udp_sk(sk);
> > > > + int fwd, amt;
> > > > +
> > > > + if (partial && !udp_under_memory_pressure(sk))
> > > > + return;
> > > > +
> > > > + /* we can have concurrent release; if we catch any conflict
> > > > + * we let only one of them do the work
> > > > + */
> > > > + if (atomic_dec_if_positive(&up->can_reclaim) < 0)
> > > > + return;
> > > > +
> > > > + fwd = __udp_forward(up, atomic_read(&sk->sk_rmem_alloc));
> > > > + if (fwd < SK_MEM_QUANTUM + partial) {
> > > > + atomic_inc(&up->can_reclaim);
> > > > + return;
> > > > + }
> > > > +
> > > > + amt = (fwd - partial) & ~(SK_MEM_QUANTUM - 1);
> > > > + atomic_sub(amt, &up->mem_allocated);
> > > > + atomic_inc(&up->can_reclaim);
> > > > +
> > > > + __sk_mem_reduce_allocated(sk, amt >> SK_MEM_QUANTUM_SHIFT);
> > > > + sk->sk_forward_alloc = fwd - amt;
> > > > +}
> > >
> > >
> > > This is racy... all these atomics make me nervous...
> >
> > Ah, perhaps I got it: if we have a concurrent memory scheduling, we
> > could end up with a value of mem_allocated below the real need.
> >
> > That mismatch will not drift: at worst we can end up with mem_allocated
> > being single SK_MEM_QUANTUM below what is strictly needed.
> >
> > A possible alternative could be:
> >
> > static void udp_rmem_release(struct sock *sk, int partial)
> > {
> > struct udp_sock *up = udp_sk(sk);
> > int fwd, amt, alloc_old, alloc;
> >
> > if (partial && !udp_under_memory_pressure(sk))
> > return;
> >
> > alloc = atomic_read(&up->mem_allocated);
> > fwd = alloc - atomic_read(&sk->sk_rmem_alloc);
> > if (fwd < SK_MEM_QUANTUM + partial)
> > return;
> >
> > amt = (fwd - partial) & ~(SK_MEM_QUANTUM - 1);
> > alloc_old = atomic_cmpxchg(&up->mem_allocated, alloc, alloc - amt);
> > /* if a concurrent update is detected, just do nothing; if said update
> > * is due to another memory release, that release take care of
> > * reclaiming the memory for us, too.
> > * Otherwise we will be able to release on later dequeue, since
> > * we will eventually stop colliding with the writer when it will
> > * consume all the fwd allocated memory
> > */
> > if (alloc_old != alloc)
> > return;
> >
> > __sk_mem_reduce_allocated(sk, amt >> SK_MEM_QUANTUM_SHIFT);
> > sk->sk_forward_alloc = fwd - amt;
>
> Can still be done from multiple cpus.
>
> Add some ndelay() or udelay() before to simulate fact that current cpu
> could be interrupted by an NMI handler (perf for example)... or hard IRQ
> handler...
>
> Then make sure your tests involve 16 concurrent cpus dealing with one
> udp socket...
Thank you again reviewing this.
I'm working to this sort of tests right now.
>
> > }
> >
> > which is even more lazy in reclaiming but should never underestimate the
> > needed forward allocation, and under pressure should eventually free the
> > needed memory.
>
>
> If this code is rarely used, why don't you simply use a real spinlock,
> so that we do not have to worry about all this ?
>
> A spinlock acquisition/release is a _single_ locked operation.
> Faster than the 3 atomic you got in last version.
> spinlock code (ticket or MCS) avoids starvation.
I'd like to avoid adding a lock, if possible, to avoid any possible
source of contention.
> Then, you can safely update multiple fields in the socket.
>
> And you get nice lockdep support as a bonus.
>
> cmpxchg() is fine when a single field need an exclusion. But there you
> have multiple fields to update at once :
>
> sk_memory_allocated_add() and sk_memory_allocated_sub() can work using
> atomic_long_add_return() and atomic_long_sub() because their caller owns
> the socket lock and can safely update sk->sk_forward_alloc without
> additional locking, but UDP wont have this luxury after your patches.
When we reach __sk_mem_reduce_allocated() we are sure we can free the
specified amount of memory, so we only need to ensure consistent
sk_prot->memory_allocated updates. The current atomic operation suffices
to this.
Paolo
Powered by blists - more mailing lists