lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Mon, 26 Apr 2010 21:35:52 +0800
From:	Changli Gao <xiaosuo@...il.com>
To:	hadi@...erus.ca
Cc:	Eric Dumazet <eric.dumazet@...il.com>,
	Rick Jones <rick.jones2@...com>,
	David Miller <davem@...emloft.net>, therbert@...gle.com,
	netdev@...r.kernel.org, robert@...julf.net, andi@...stfloor.org
Subject: Re: rps perfomance WAS(Re: rps: question

On Mon, Apr 26, 2010 at 7:35 PM, jamal <hadi@...erus.ca> wrote:
> On Sun, 2010-04-25 at 10:31 +0800, Changli Gao wrote:
>
>> I read the code again, and find that we don't use spin_lock_irqsave(),
>> and we use local_irq_save() and spin_lock() instead, so
>> _raw_spin_lock_irqsave() and _raw_spin_lock_irqrestore() should not be
>> related to backlog. the lock maybe sk_receive_queue.lock.
>
> Possible.
> I am wondering if there's a way we can precisely nail where that is
> happening? is lockstat any use?
> Fixing _raw_spin_lock_irqsave and friend is the lowest hanging fruit.
>

Maybe lockstat can help in this case.

> So looking at your patch now i see it is likely there was an improvement
> made for non-rps case (moving out of loop some irq_enable etc).
> i.e my results may not be crazy after adding your patch and seeing an
> improvement for non-rps case.
> However, whatever your patch did - it did not help the rps case case:
> call_function_single_interrupt() comes out higher in the profile,
> and # of IPIs seems to have gone up (although i did not measure this, I
> can see the interrupts/second went up by almost 50-60%)

Did you apply the patch from Eric? It would reduce the number of
local_irq_disable() calls but increase the number of IPIs.

>
>> Jamal, did you use a single socket to serve all the clients?
>
> Socket per detected cpu.

Ignore it. I made a mistake here.

>
>> BTW:  completion_queue and output_queue in softnet_data both are LIFO
>> queues. For completion_queue, FIFO is better, as the last used skb is
>> more likely in cache, and should be used first. Since slab has always
>> cache the last used memory at the head, we'd better free the skb in
>> FIFO manner. For output_queue, FIFO is good for fairness among qdiscs.
>
> I think it will depend on how many of those skbs are sitting in the
> completion queue, cache warmth etc. LIFO is always safest, you have
> higher probability of finding a cached skb infront.
>

we call kfree_skb() to release skbs to slab allocator, then slab
allocator stores them in a LIFO queue. If completion queue is also a
LIFO queue, the latest unused skb will be in the front of the queue,
and will be released to slab allocator at first. At the next time, we
call alloc_skb(), the memory used by the skb in the end of the
completion queue will be returned instead of the hot one.

However, as Eric said, new drivers don't rely on completion queue, it
isn't a real problem, especially in your test case.


-- 
Regards,
Changli Gao(xiaosuo@...il.com)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ