netdev - Re: Multicast packet loss

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <49B4B909.7050002@cosmosbay.com>
Date:	Mon, 09 Mar 2009 07:36:57 +0100
From:	Eric Dumazet <dada1@...mosbay.com>
To:	David Miller <davem@...emloft.net>
CC:	kchang@...enacr.com, netdev@...r.kernel.org,
	cl@...ux-foundation.org, bmb@...enacr.com
Subject: Re: Multicast packet loss

David Miller a écrit :
> From: Eric Dumazet <dada1@...mosbay.com>
> Date: Sun, 08 Mar 2009 17:46:13 +0100
> 
>> +	if (sk->sk_sleep && waitqueue_active(sk->sk_sleep)) {
>> +		if (in_softirq()) {
>> +			if (!softirq_del(&sk->sk_del, sock_readable_defer))
>> +				goto unlock;
>> +			return;
>> +		}
> 
> This is interesting.
> 
> I think you should make softirq_del() more flexible.  Make it the
> socket's job to make sure it doesn't try to defer different
> functions, and put the onus on locking there as well.
> 
> The cmpxchg() and all of this checking is just wasted work.
> 
> I'd really like to get rid of that callback lock too, then we'd
> really be in business. :-)

First thanks for your review David.

I chose cmpxchg() because I needed some form of exclusion here.
I first added a spinlock inside "struct softirq_del" then I realize
I could use cmpxchg() instead and keep the structure small. As the
synchronization is only needed at queueing time, we could pass
the address of a spinlock XXX to sofirq_del() call.

Also, when an event was queued for later invocation, I also needed to keep
a reference on "struct socket" to make sure it doesnt disappear before
the invocation. Not all sockets are RCU guarded (we added RCU only for 
some protocols (TCP, UDP ...). So I found keeping a read_lock
on callback was the easyest thing to do. I now realize we might
overflow preempt_count, so special care is needed.

About your first point, maybe we should make sofirq_del() (poor name I confess)
only have one argument (pointer to struct softirq_del), and initialize
the function pointer at socket init time. That would insure "struct softirq_del"
is associated to one callback only. cmpxchg() test would have to be
done on "next" field then (or use the spinlock XXX)

I am not sure output path needs such tricks, since threads are rarely
blocking on output : We dont trigger 400.000 wakeups per second ?

Another point : I did a tbench test and got 2517 MB/s with the patch,
instead of 2538 MB/s (using Linus 2.6 git tree), thats ~ 0.8 % regression
for this workload.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html