[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e1ed4a57-f32c-3fcd-5caf-0861ef7cf0b5@gmail.com>
Date: Wed, 18 Dec 2024 12:25:21 +0000
From: Edward Cree <ecree.xilinx@...il.com>
To: John Ousterhout <ouster@...stanford.edu>
Cc: netdev@...r.kernel.org
Subject: Re: [PATCH net-next v3 03/12] net: homa: create shared Homa header
files
On 18/12/2024 05:46, John Ousterhout wrote:
> (note: these comments arrived after I posted the v4 patch series, so
> fixes will appear in v5)
Yeah, I'm really slow at getting these reviews done, sorry about that.
(This is probably the last one until after the holidays.)
> On Mon, Dec 16, 2024 at 10:36 PM Edward Cree <ecree.xilinx@...il.com> wrote:
>> Should parts of 'struct homa' be per network namespace, rather than
>> global, so that in systems hosting multiple containers each netns can
>> configure Homa for the way it wants to use it?
>
> Possibly. I haven't addressed the issue of customizing the
> configuration very thoroughly yet, but I can imagine it might happen
> at multiple levels (e.g. for a network namespace, a socket, etc.). I'd
> like to defer this a bit if possible.
I think it's reasonable to leave that out of the initial upstreaming,
yeah, as long as you're confident you're not backing yourself into a
corner with either implementation or uAPI that would make that
difficult later.
>>> + /**
>>> + * @locked: Nonzero means that @ready_rpc is locked; only valid
>>> + * if @ready_rpc is non-NULL.
>>> + */
Oh, I think I understand this now; does it mean that "the RPC lock on
ready_rpc is held"? I initially read it as "the field @ready_rpc is
in some sense latched and its value won't change", which didn't make
a lot of sense.
> This is a lock-free mechanism to hand off a complete message to a
> receiver thread (which may be polling, though the polling code has
> been removed from this stripped down patch series). I couldn't find an
> "atomic pointer" structure, which is why the code uses atomic_long_t
> (which I agree is a bit ugly).
As far as I can tell, ready_rpc only ever transitions from NULL to
populated; once non-NULL, the value is never overwritten by a
different pointer. Is that correct? If so, I believe you could use
a (typed) nonatomic pointer and an atomic flag to indicate "there is
an RPC in ready_rpc".
> I'm not
> sure I understand your comment about not manually sleeping and waking
> threads from within Homa; is there a particular mechanism for this
> that you have in mind?
It's not in this patch but homa_rpc_handoff() does a wake_up_process()
and home_wait_for_message() does set_current_state(), neither of
which ought to be necessary.
I think wait-queues do what you want (see `struct wait_queue_head` and
`wait_event_interruptible` in include/linux/wait.h — it's basically a
condvar), or if you want to wake unconditionally then completions
(kernel/sched/completion.c, include/linux/completion.h).
>>> + interest->request_links.next = LIST_POISON1;
>>> + interest->response_links.next = LIST_POISON1;
>>
>> Any particular reason why you're opencoding poisoning, rather than
>> using the list helpers (which distinguish between a list_head that
>> has been inited but never added, so list_empty() returns true, and
>> one which has been list_del()ed and thus poisoned)?
>> It would likely be easier for others to debug any issues that arise
>> in Homa if when they see a list_head in an oops or crashdump they
>> can relate it to the standard lifecycle.
>
> I couldn't find any other way to do this: I want to initialize the
> links to be the same state as if list_del had been called,
If there's a reason why this is necessary, there should be a comment
here explaining why. I *think*, from poking around the rest of the
Homa code, you're using 'next == LIST_POISON1' to signal some kind
of state, but if it's just "this interest is not on a list", then
list_empty() after INIT_LIST_HEAD() or list_del_init() should work
just as well. (Use list_del_init(), rather than plain list_del(),
if the interest is still reachable by some code that may need to
tell whether the interest is on a list; otherwise, using list_del()
gets the poisoning behaviour that will give a recognisable error if
buggy code tries to walk the list anyway.)
>> And are there any security issues here; ought we to do anything
>> like TCP does with sequence numbers to try to ensure they aren't
>> guessable by an attacker?
>
> There probably are, and the right solutions may well be similar to
> TCP. I'm fairly ignorant on the potential security issues; is there
> someplace where I can learn more?
I don't really know either, sorry. All I can suggest is asking the
TCP folks.
>> I'm not sure exactly how it works but I believe you can annotate
>> the declaration with __rcu to get sparse to enforce this.
>
> I poked around and it appears to me that list_head's don't get
> declared '__rcu' (this designation is intended for pointers, if I'm
> understanding correctly). Instead, the list_head is manipulated with
> rcu functions such as list_for_each_entry_rcu. Let me know if I'm
> missing something?
I thought that declaring a struct (like list_head) __rcu would
propagate through to its members, but I may be wrong about that.
My hope was that sparse could check that all operations on the list
were indeed using the correct _rcu-suffixed functions, but if
that's not the case then feel free to ignore me.
>>> + /**
>>> + * @link_bandwidth: The raw bandwidth of the network uplink, in
>>> + * units of 1e06 bits per second. Set externally via sysctl.
>>> + */
>>> + int link_mbps;
>>
>> What happens if a machine has two uplinks and someone wants to
>> use Homa on both of them? I wonder if most of the granting and
>> pacing part of Homa ought to be per-netdev rather than per-host.
>> (Though in an SDN case with a bunch of containers issuing their
>> RPCs through veths you'd want a Homa-aware bridge that could do
>> the SRPT rather than bandwidth sharing, and having everything go
>> through a single Homa stack instance does give you that for free.
>> But then a VM use-case still needs the clever bridge anyway.)
>
> Yes, I'm planning to implement a Homa-specific qdisc that will
> implement packet pacing on a per-netdev basis, and that will eliminate
> the need for this variable.
Sounds good.
> Virtual machines are more complex;
> to do Homa right, pacing (and potentially granting also) must be done
> in a central place that knows about all traffic on a given link.
I guess that means to support SR-IOV at least part of the Homa
transport needs to be implemented in the NIC.
I agree that this problem doesn't need to be solved for the
initial upstreaming.
-ed
Powered by blists - more mailing lists