linux-kernel - Re: [PATCH v2] tcp: splice as many packets as possible at once

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 4 Feb 2009 20:23:46 +1100
From:	Nick Piggin <nickpiggin@...oo.com.au>
To:	Evgeniy Polyakov <zbr@...emap.net>
Cc:	David Miller <davem@...emloft.net>, jarkao2@...il.com,
	herbert@...dor.apana.org.au, w@....eu, dada1@...mosbay.com,
	ben@...s.com, mingo@...e.hu, linux-kernel@...r.kernel.org,
	netdev@...r.kernel.org, jens.axboe@...cle.com
Subject: Re: [PATCH v2] tcp: splice as many packets as possible at once

On Wednesday 04 February 2009 19:08:51 Evgeniy Polyakov wrote:
> On Tue, Feb 03, 2009 at 04:46:09PM -0800, David Miller (davem@...emloft.net) 
wrote:
> > > NTA tried to solve this by not allowing to free the data allocated on
> > > the different CPU, contrary to what SLAB does. Modulo cache coherency
> > > improvements,
> >
> > This could kill performance on NUMA systems if we are not careful.
> >
> > If we ever consider NTA seriously, these issues would need to
> > be performance tested.
>
> Quite contrary I think. Memory is allocated and freed on the same CPU,
> which means on the same memory domain, closest to the CPU in question.
>
> I did not test NUMA though, but NTA performance on the usual CPU (it is
> 2.5 years old already :) was noticebly good.

I had a quick look at NTA... I didn't understand much of it yet, but
the remote freeing scheme is kind of like what I did for slqb. The
freeing CPU queues objects back to the CPU that allocated them, which
eventually checks the queue and frees them itself.

I don't know how much cache coherency gains you get from this -- in
most slab allocations, I think the object tends to be cache on on the
CPU that frees it. I'm doing it mainly to try avoid locking... I guess
that makes for cache coherency benefit itself.

If NTA does significantly better than slab allocator, I would be quite
interested. It might be something that we can learn from and use in
the general slab allocator (or maybe something more network specific
that NTA does).

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/