netdev - Re: rps perfomance WAS(Re: rps: question

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:	Mon, 26 Apr 2010 21:35:52 +0800
From:	Changli Gao <xiaosuo@...il.com>
To:	hadi@...erus.ca
Cc:	Eric Dumazet <eric.dumazet@...il.com>,
	Rick Jones <rick.jones2@...com>,
	David Miller <davem@...emloft.net>, therbert@...gle.com,
	netdev@...r.kernel.org, robert@...julf.net, andi@...stfloor.org
Subject: Re: rps perfomance WAS(Re: rps: question

On Mon, Apr 26, 2010 at 7:35 PM, jamal <hadi@...erus.ca> wrote:
> On Sun, 2010-04-25 at 10:31 +0800, Changli Gao wrote:
>
>> I read the code again, and find that we don't use spin_lock_irqsave(),
>> and we use local_irq_save() and spin_lock() instead, so
>> _raw_spin_lock_irqsave() and _raw_spin_lock_irqrestore() should not be
>> related to backlog. the lock maybe sk_receive_queue.lock.
>
> Possible.
> I am wondering if there's a way we can precisely nail where that is
> happening? is lockstat any use?
> Fixing _raw_spin_lock_irqsave and friend is the lowest hanging fruit.
>

Maybe lockstat can help in this case.

> So looking at your patch now i see it is likely there was an improvement
> made for non-rps case (moving out of loop some irq_enable etc).
> i.e my results may not be crazy after adding your patch and seeing an
> improvement for non-rps case.
> However, whatever your patch did - it did not help the rps case case:
> call_function_single_interrupt() comes out higher in the profile,
> and # of IPIs seems to have gone up (although i did not measure this, I
> can see the interrupts/second went up by almost 50-60%)

Did you apply the patch from Eric? It would reduce the number of
local_irq_disable() calls but increase the number of IPIs.

>
>> Jamal, did you use a single socket to serve all the clients?
>
> Socket per detected cpu.

Ignore it. I made a mistake here.

>
>> BTW:  completion_queue and output_queue in softnet_data both are LIFO
>> queues. For completion_queue, FIFO is better, as the last used skb is
>> more likely in cache, and should be used first. Since slab has always
>> cache the last used memory at the head, we'd better free the skb in
>> FIFO manner. For output_queue, FIFO is good for fairness among qdiscs.
>
> I think it will depend on how many of those skbs are sitting in the
> completion queue, cache warmth etc. LIFO is always safest, you have
> higher probability of finding a cached skb infront.
>

we call kfree_skb() to release skbs to slab allocator, then slab
allocator stores them in a LIFO queue. If completion queue is also a
LIFO queue, the latest unused skb will be in the front of the queue,
and will be released to slab allocator at first. At the next time, we
call alloc_skb(), the memory used by the skb in the end of the
completion queue will be returned instead of the hot one.

However, as Eric said, new drivers don't rely on completion queue, it
isn't a real problem, especially in your test case.


-- 
Regards，
Changli Gao(xiaosuo@...il.com)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html