[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141007185940.GE27719@hmsreliant.think-freely.org>
Date: Tue, 7 Oct 2014 14:59:40 -0400
From: Neil Horman <nhorman@...driver.com>
To: Hannes Frederic Sowa <hannes@...essinduktion.org>
Cc: John Fastabend <john.r.fastabend@...el.com>,
Daniel Borkmann <dborkman@...hat.com>,
John Fastabend <john.fastabend@...il.com>,
Jesper Dangaard Brouer <jbrouer@...hat.com>,
"John W. Linville" <linville@...driver.com>,
Florian Westphal <fw@...len.de>, gerlitz.or@...il.com,
netdev@...r.kernel.org, john.ronciak@...el.com, amirv@...lanox.com,
eric.dumazet@...il.com, danny.zhou@...el.com,
Willem de Bruijn <willemb@...gle.com>
Subject: Re: [net-next PATCH v1 1/3] net: sched: af_packet support for direct
ring access
On Tue, Oct 07, 2014 at 01:26:11AM +0200, Hannes Frederic Sowa wrote:
> Hi John,
>
> On Mon, Oct 6, 2014, at 22:37, John Fastabend wrote:
> > > I find the six additional ndo ops a bit worrisome as we are adding more
> > > and more subsystem specific ndoops to this struct. I would like to see
> > > some unification here, but currently cannot make concrete proposals,
> > > sorry.
> >
> > I agree it seems like a bit much. One thought was to split the ndo
> > ops into categories. Switch ops, MACVLAN ops, basic ops and with this
> > userspace queue ops. This sort of goes along with some of the switch
> > offload work which is going to add a handful more ops as best I can
> > tell.
>
> Thanks for your mail, you answered all of my questions.
>
> Have you looked at <https://code.google.com/p/kernel/wiki/ProjectUnetq>?
> Willem (also in Cc) used sysfs files which get mmaped to represent the
> tx/rx descriptors. The representation was independent of the device and
> IIRC the prototype used a write(fd, "", 1) to signal the kernel it
> should proceed with tx. I agree, it would be great to be syscall-free
> here.
>
> For the semantics of the descriptors we could also easily generate files
> in sysfs. I thought about something like tracepoints already do for
> representing the data in the ringbuffer depending on the event:
>
> -- >8 --
> # cat /sys/kernel/debug/tracing/events/net/net_dev_queue/format
> name: net_dev_queue
> ID: 1006
> format:
> field:unsigned short common_type; offset:0; size:2;
> signed:0;
> field:unsigned char common_flags; offset:2; size:1;
> signed:0;
> field:unsigned char common_preempt_count; offset:3;
> size:1; signed:0;
> field:int common_pid; offset:4; size:4; signed:1;
>
> field:void * skbaddr; offset:8; size:8; signed:0;
> field:unsigned int len; offset:16; size:4; signed:0;
> field:__data_loc char[] name; offset:20; size:4;
> signed:1;
>
> print fmt: "dev=%s skbaddr=%p len=%u", __get_str(name), REC->skbaddr,
> REC->len
> -- >8 --
>
> Maybe the macros from tracing are reusable (TP_STRUCT__entry), e.g.
> endianess would need to be added. Hopefully there is already a user
> space parser somewhere in the perf sources. An easier to parse binary
> representation could be added easily and maybe even something vDSO alike
> if people care about that.
>
> Maybe this open/mmap per queue also kills some of the ndo_ops?
>
> Bye,
> Hannes
>
John-
I don't know if its of use to you here, but I was experimenting awhile
ago with af_packet memory mapping, using the protection bits in the page tables
as a doorbell mechanism. I scrapped the work as the performance bottleneck for
af_packet wasn't found in the syscall trap time, but it occurs to me, it might
be useful for you here, in that, using this mechanism, if you keep the transmit
ring non-empty, you only encur the cost of a single trap to start the transmit
process. Let me know if you want to see it.
Neil
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists