[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPi7mHqV6Pgk5tSt+fJn_2aEZKZbO=aO78JXgw7MvDaW0musoQ@mail.gmail.com>
Date: Thu, 18 Aug 2011 16:18:47 -0700
From: San Mehat <san@...gle.com>
To: Alan Cox <alan@...rguk.ukuu.org.uk>
Cc: davem@...emloft.net, mst@...hat.com, rusty@...tcorp.com.au,
linux-kernel@...r.kernel.org,
virtualization@...ts.linux-foundation.org, netdev@...r.kernel.org,
digitaleric@...gle.com, mikew@...gle.com, miche@...gle.com,
maccarro@...gle.com
Subject: Re: [RFC 0/0] Introducing a generic socket offload framework
On Thu, Aug 18, 2011 at 3:57 PM, Alan Cox <alan@...rguk.ukuu.org.uk> wrote:
>> The Berkeley sockets coprocessor is a virtual PCI device which has the ability
>> to offload socket activity from an unmodified application at the BSD sockets
>
> Ok I think there is an important question here. Why is this being
> designed for a specific virtual interface. Unix has always had the notion
> that socket operations can be in part generic and that you can pass a
> properly designed program a socket without any notion of what it is for.
Sorry Alan if I wasn't clear, but I'm not quite sure what you're asking...
If you're asking 'why have you only spec'ed out a virtual interface
for this' then
my answer would be 'but of course you could design this in real hardware and
have a proper driver :)'. If you'd prefer that I call that out
specifically I'm happy to do so.
I have no desire to change the 'genericness' of sockets.. just the
opposite - i wish to
introduce the notion that sockets (can be) completely generic (when
offloaded) as far as
the guest is concerned.
>
>> Lastly, pushing socket processing back into the host allows for host-side
>> control of the network protocols used, which limits the potential congestion
>> problems that can arise when various guests are using their own congestion
>> control algorithms.
>
> Does that not depend which side does the congestion and who parcels out
> buffers ?
It does, and it does.
>
>> Since we wish to allow these paravirtualized sockets to coexist peacefully with
>> the existing Linux socket system, we've chosen to introduce the idea that a
>> socket can at some point transition from being managed by the O/S socket system
>> to a more enlightened 'hardware assisted' socket. The transition is managed by
>> a 'socket coprocessor' component which intercepts and gets first right of
>> refusal on handling certain global socket calls (connect, sendto, bind, etc...).
>> In this initial design, the policy on whether to transition a socket or not is
>> made by the virtual hardware, although we understand that further measurement
>> into operation latency is warranted.
>
> Q: whay happens about in process socket syscalls in another thread ?
> Thats always been the ugly in these cases either by intercepting or by
> swapping file operations on an object.
>
>> * SOCK_HWASSIST
>> Indicates socket operations are handled by hardware
>
> This guest only view means you can't use the abstraction for local
> sockets too.
>
To be honest, the way we're attempting to integrate is in such a way
that you *could*
offload AF_LOCAL sockets... but that world gets a bit too much like
the 'Twilight Zone'
for my current linkings..
>> In order to support a variety of socket address families, addresses are
>> converted from their native socket family to an opaque string. Our initial
>> design formats these strings as URIs. The currently supported conversions are:
>
> That makes a lot of sense to me, because its a well understood
> abstraction and you can offload other stuff to this kind of generic
> socket including things like http protocol acceleration, SSL and so on.
>
> Plus its always been annoying that you can't open a socket, but a URI
> interface solves that...
Indeed.
>
>> * We don't handle SOCK_SEQPACKET, SOCK_RAW, SOCK_RDM, or SOCK_PACKET sockets.
>
> But there is no reason SEQPACKET and RDM couldn't be added I assume?
No reason I can think of - we just did not have a specific requirement
for it at the time.
>
> Ok other questions
>
> Suppose instead you just add an abstracted socket interface of
>
> AF_SOMETHING, PF_URI
Mike Waychison and I were saving the 'PF_URI' discussion for a future
date, but indeed
we're on the same wave-length :). Our initial requirements are for an
'extremely minimal
burden of support' on the userspace environments, so we decided to
open up a separate
discussion on PF_URI
>
> it would be easy to convert programs. It would be easier to write
> properly generic programs. It would be easy write some small helpers that
> are a good deal less insane than the existing inet ones. At that point
> you could turn the problem on its head. Instead of 'borrowing' sockets
> for a fairly specific concept of hw assist you ask the reverse question,
> who can accelerate this URI be it some kind of virtual machine interface,
> something funky like raw data over infiniband, or plain old 'use the
> TCP/IP stack'.
Completely agree.
>
> Your decision making code is going to be interesting but it only has to
> make the decision once in simple cases.
Yup.
>
> And yes there is still the complicated cases such as 'the routing table
> has changed from vitual host to via siberia now what' but I don't believe
> your proposal addresses that either.
Can you be more specific? If you mean solving the 'keeping your tcp connections
open to non virtual endpoints across a migration (or whatever)' then
no it doesn't :)
>
> Alan
>
Thanks man,
-san
--
San Mehat | Staff Software Engineer | san@...gle.com | 415-366-6172
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists