netdev - Re: [PATCH net-next v3 03/12] net: homa: create shared Homa header files

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e1ed4a57-f32c-3fcd-5caf-0861ef7cf0b5@gmail.com>
Date: Wed, 18 Dec 2024 12:25:21 +0000
From: Edward Cree <ecree.xilinx@...il.com>
To: John Ousterhout <ouster@...stanford.edu>
Cc: netdev@...r.kernel.org
Subject: Re: [PATCH net-next v3 03/12] net: homa: create shared Homa header
 files

On 18/12/2024 05:46, John Ousterhout wrote:
> (note: these comments arrived after I posted the v4 patch series, so
> fixes will appear in v5)

Yeah, I'm really slow at getting these reviews done, sorry about that.
(This is probably the last one until after the holidays.)

> On Mon, Dec 16, 2024 at 10:36 PM Edward Cree <ecree.xilinx@...il.com> wrote:
>> Should parts of 'struct homa' be per network namespace, rather than
>>  global, so that in systems hosting multiple containers each netns can
>>  configure Homa for the way it wants to use it?
> 
> Possibly. I haven't addressed the issue of customizing the
> configuration very thoroughly yet, but I can imagine it might happen
> at multiple levels (e.g. for a network namespace, a socket, etc.). I'd
> like to defer this a bit if possible.

I think it's reasonable to leave that out of the initial upstreaming,
 yeah, as long as you're confident you're not backing yourself into a
 corner with either implementation or uAPI that would make that
 difficult later.

>>> +     /**
>>> +      * @locked: Nonzero means that @ready_rpc is locked; only valid
>>> +      * if @ready_rpc is non-NULL.
>>> +      */

Oh, I think I understand this now; does it mean that "the RPC lock on
 ready_rpc is held"?  I initially read it as "the field @ready_rpc is
 in some sense latched and its value won't change", which didn't make
 a lot of sense.

> This is a lock-free mechanism to hand off a complete message to a
> receiver thread (which may be polling, though the polling code has
> been removed from this stripped down patch series). I couldn't find an
> "atomic pointer" structure, which is why the code uses atomic_long_t
> (which I agree is a bit ugly).

As far as I can tell, ready_rpc only ever transitions from NULL to
 populated; once non-NULL, the value is never overwritten by a
 different pointer.  Is that correct?  If so, I believe you could use
 a (typed) nonatomic pointer and an atomic flag to indicate "there is
 an RPC in ready_rpc".

> I'm not
> sure I understand your comment about not manually sleeping and waking
> threads from within Homa; is there a particular mechanism for this
> that you have in mind?

It's not in this patch but homa_rpc_handoff() does a wake_up_process()
 and home_wait_for_message() does set_current_state(), neither of
 which ought to be necessary.
I think wait-queues do what you want (see `struct wait_queue_head` and
 `wait_event_interruptible` in include/linux/wait.h — it's basically a
 condvar), or if you want to wake unconditionally then completions
 (kernel/sched/completion.c, include/linux/completion.h).

>>> +     interest->request_links.next = LIST_POISON1;
>>> +     interest->response_links.next = LIST_POISON1;
>>
>> Any particular reason why you're opencoding poisoning, rather than
>>  using the list helpers (which distinguish between a list_head that
>>  has been inited but never added, so list_empty() returns true, and
>>  one which has been list_del()ed and thus poisoned)?
>> It would likely be easier for others to debug any issues that arise
>>  in Homa if when they see a list_head in an oops or crashdump they
>>  can relate it to the standard lifecycle.
> 
> I couldn't find any other way to do this: I want to initialize the
> links to be the same state as if list_del had been called,

If there's a reason why this is necessary, there should be a comment
 here explaining why.  I *think*, from poking around the rest of the
 Homa code, you're using 'next == LIST_POISON1' to signal some kind
 of state, but if it's just "this interest is not on a list", then
 list_empty() after INIT_LIST_HEAD() or list_del_init() should work
 just as well.  (Use list_del_init(), rather than plain list_del(),
 if the interest is still reachable by some code that may need to
 tell whether the interest is on a list; otherwise, using list_del()
 gets the poisoning behaviour that will give a recognisable error if
 buggy code tries to walk the list anyway.)

>> And are there any security issues here; ought we to do anything
>>  like TCP does with sequence numbers to try to ensure they aren't
>>  guessable by an attacker?
> 
> There probably are, and the right solutions may well be similar to
> TCP. I'm fairly ignorant on the potential security issues; is there
> someplace where I can learn more?

I don't really know either, sorry.  All I can suggest is asking the
 TCP folks.

>> I'm not sure exactly how it works but I believe you can annotate
>>  the declaration with __rcu to get sparse to enforce this.
> 
> I poked around and it appears to me that list_head's don't get
> declared '__rcu' (this designation is intended for pointers, if I'm
> understanding correctly). Instead, the list_head is manipulated with
> rcu functions such as list_for_each_entry_rcu. Let me know if I'm
> missing something?

I thought that declaring a struct (like list_head) __rcu would
 propagate through to its members, but I may be wrong about that.
My hope was that sparse could check that all operations on the list
 were indeed using the correct _rcu-suffixed functions, but if
 that's not the case then feel free to ignore me.

>>> +     /**
>>> +      * @link_bandwidth: The raw bandwidth of the network uplink, in
>>> +      * units of 1e06 bits per second.  Set externally via sysctl.
>>> +      */
>>> +     int link_mbps;
>>
>> What happens if a machine has two uplinks and someone wants to
>>  use Homa on both of them?  I wonder if most of the granting and
>>  pacing part of Homa ought to be per-netdev rather than per-host.
>> (Though in an SDN case with a bunch of containers issuing their
>>  RPCs through veths you'd want a Homa-aware bridge that could do
>>  the SRPT rather than bandwidth sharing, and having everything go
>>  through a single Homa stack instance does give you that for free.
>>  But then a VM use-case still needs the clever bridge anyway.)
> 
> Yes, I'm planning to implement a Homa-specific qdisc that will
> implement packet pacing on a per-netdev basis, and that will eliminate
> the need for this variable.

Sounds good.

> Virtual machines are more complex;
> to do Homa right, pacing (and potentially granting also) must be done
> in a central place that knows about all traffic on a given link.

I guess that means to support SR-IOV at least part of the Homa
 transport needs to be implemented in the NIC.
I agree that this problem doesn't need to be solved for the
 initial upstreaming.

-ed