[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+mtBx-ZW3jVzaXicGi5okS5egMkfi8-SaDX_ewy-k5zRONfmg@mail.gmail.com>
Date: Fri, 20 Feb 2015 15:05:11 -0800
From: Tom Herbert <therbert@...gle.com>
To: Jonathon Reinhart <jonathon.reinhart@...il.com>
Cc: Sunil Kovvuri <sunil.kovvuri@...il.com>,
Linux Netdev List <netdev@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>
Subject: Re: Setting RPS affinities from network driver
On Fri, Feb 20, 2015 at 2:46 PM, Jonathon Reinhart
<jonathon.reinhart@...il.com> wrote:
> On Fri, Feb 20, 2015 at 5:30 PM, Tom Herbert <therbert@...gle.com> wrote:
>> On Thu, Feb 19, 2015 at 6:07 PM, Jonathon Reinhart
>> <jonathon.reinhart@...il.com> wrote:
>>>
>>> I don't know if this is a good idea. It seems like allowing the driver to
>>> set this default configuration opens the door to a whole slew of driver-
>>> specific config customizations.
>>>
>>> As as user, I wouldn't expect one driver to have a different default
>>> value of rps_cpus than another driver. Furthermore, I could imagine a
>>> case where a user expects rps_cpus to default to zero for all of his
>>> NICs, and has other CPU affinity settings applied (think realtime).
>>>
>>> In my opinion, this should be left to userspace. It's not that hard to
>>> add it to your init scripts.
>>>
>>>
>> In the old days I might have agreed with that-- kernel implements
>> mechanism and policy is handled in user space. However, in this brave
>> new world of hardware offloads for more and more kernel networking,
>> I'm not so sure about this any more. It's starting to look like we may
>> want kernel to do more dynamic resource management under some
>> described policies.
>>
>> In davem's keynote at netdev he mentioned that hardware offload should
>> be transparent to things like ip route, so one implication is that at
>> some point the kernel may need to decide which routes are the best to
>> offload per a policy. Also at netdev, Jesse Brandeburg mentioned that
>> Windows had a capabilities of spinning up RSS queues to handle
>> increased load, and lamented that we couldn't do this easily in Linux.
>> Willem deBruijn made RPS resilient under DOS with RPS flow limit as a
>> step in that direction. We have spent countless hours tuning RPS and
>> interrupt settings per platform, per NIC, and per some major
>> applications-- but the resultant init scripts that result are very
>> convoluted and static. It is an interesting idea if we could just tell
>> kernel to take it's best guess and be adaptive. I suppose something
>> like irqbalance is the alternative to do this sort of stuff in user
>> space, but I don't know if that is well deployed or sufficiently
>> reactive under DOS attack.
>
> I agree with your general sentiment, but does it make sense for this to
> be controlled at a per-driver level? The OP indicates that he wants his
> driver to perform well, so he wants to enable RPS for it by default. If the
> RPS policy is to go into the kernel, shouldn't it be done at the global
> scope?
>
Right, that's what I meant by putting it into a library. RPS is not
something that is driver specific configuration.
> Note that this argument is different from RSS where we're dealing with
> actual hardware queues, so the driver of course has a say in the
> configuration.
>
Assuming that all queues are equal and we have a standard way to
influence the indirection table, even RSS configuration really isn't
driver specific configuration. We just need to know how many queues
are available.
> Going on with what you alluded to, I think if the kernel were to have a say
> in the RPS configuration, there would need to be some sort of global on/off
> switch.
Yes, but I think it's more that an on/off switch. There could be some
canned configurations like "one RX queue per CPU", "one per numa
node", etc. Also, RPS configuration is a part of that. For one queue
per CPU, RPS should probably be off. For one queue per numa node it's
seems common for RPS of that queue to be on the other nodes.
I have no doubt that Sunil didn't bargain for all this in posting his
patches ;-), but it does seem to be seeds for a broader network
resource manager in kernel!
Tom
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists