[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20081106194032.GB31673@ioremap.net>
Date: Thu, 6 Nov 2008 22:40:32 +0300
From: Evgeniy Polyakov <zbr@...emap.net>
To: "Lovich, Vitali" <vlovich@...lcomm.com>
Cc: Johann Baudy <johaahn@...il.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [PATCH] Packet socket: mmapped IO: PACKET_TX_RING
Hi Vitali.
On Thu, Nov 06, 2008 at 10:49:36AM -0800, Lovich, Vitali (vlovich@...lcomm.com) wrote:
> > What if skb was queued in hardware or qdisk and userspace rewrites
> > mapped page placed in fraglist?
> Can you please clarify what you mean? Are you talking about the user changing the status field after the skb was queued? Or are you saying that the user may ask the kernel to resize the ring buffer while the skb is queued?
>
> In the first case, I don't think we don't really care - the user is breaking their side of the contract, but we'll still have a valid address since the page pointer should still be valid.
>
> In the second case, the behaviour is undefined because the hardware would use those page pointers for its scatter-gather which would, best case, cause the skb to fail (kernel oops is more likely I think). We can add this to the documentation, but in the end there's nothing we can really do outside of copying data from the ring buffer into the skb, which defeats the point of doing this.
I'm not sure that it is allowed to fail even for root. System could
sleep and not permit the resize until all skbs are freed though.
> > It is not allowed to store something in cb block which is intended to
> > live while skb is processed on different layers. In this particular
> > case
> > qdisk engine will overwrite skb's cb.
> >
> That's what I figured - however, I was wondering if we even need to go through the qdisc engine. Couldn't we just bypass it and send out packets directly through the card (i.e. dev_hard_start_xmit) similar to how pktgen does this? I haven't looked at the documentation (I'm not actually sure with which to start), but are there any caveats that need to be considered (e.g. must be called outside of an interrupt, can't be interruptible, etc.)
>
> The reason I'm thinking we'd want to bypass the qdisc engine is that the idea is that you would use this method only if you want the most performance possible, so why is packet scheduling even needed? The approach I'm thinking is that if the tx-ring buffer is implemented, we can even get rid of the in-kernel pktgen and port it over to user-space as an example of how to utilize the tx-ring buffer.
I can not say if it is good or bad idea, it is not my project :)
But existing packet socket does not bypass qdisc, so users may expect
the same from the new interface. Overall idea of storing something in cb
and expect it to live there between layers is very subtle: what if
tomorrow we will gen new layer between driver and qdisc? Or if we will
start updating cb in the driver itself when it owns skb?
--
Evgeniy Polyakov
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists