netdev - Re: [PATCH net-next v6 05/12] net: homa: create homa_rpc.h and homa

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAGXJAmxHDVhxKb3M0--rySAgewmLpmfJkAeRSBNRgZ=cQonDtg@mail.gmail.com>
Date: Mon, 27 Jan 2025 10:03:37 -0800
From: John Ousterhout <ouster@...stanford.edu>
To: Paolo Abeni <pabeni@...hat.com>
Cc: netdev@...r.kernel.org, edumazet@...gle.com, horms@...nel.org, 
	kuba@...nel.org
Subject: Re: [PATCH net-next v6 05/12] net: homa: create homa_rpc.h and homa_rpc.c

On Mon, Jan 27, 2025 at 2:02 AM Paolo Abeni <pabeni@...hat.com> wrote:
>
> On 1/27/25 6:22 AM, John Ousterhout wrote:
> > On Thu, Jan 23, 2025 at 6:30 AM Paolo Abeni <pabeni@...hat.com> wrote:
> >> ...
> >> How many RPCs should concurrently exist in a real server? with 1024
> >> buckets there could be a lot of them on each/some list and linear search
> >> could be very expansive. And this happens with BH disabled.
> >
> > Server RPCs tend to be short-lived, so my best guess is that the
> > number of concurrent server RPCs will be relatively small (maybe a few
> > hundred?). But this is just a guess: I won't know for sure until I can
> > measure Homa in production use. If the number of concurrent RPCs turns
> > out to be huge then we'll have to find a different solution.
> >
> >>> +
> >>> +     /* Initialize fields that don't require the socket lock. */
> >>> +     srpc = kmalloc(sizeof(*srpc), GFP_ATOMIC);
> >>
> >> You could do the allocation outside the bucket lock, too and avoid the
> >> ATOMIC flag.
> >
> > In many cases this function will return an existing RPC so there won't
> > be any need to allocate; I wouldn't want to pay the allocation
> > overhead in that case. I could conceivably check the offset in the
> > packet and pre-allocate if the offset is zero (in this case it's
> > highly unlikely that there will be an existing RPC).
>
> If you use RCU properly here, you could do a lockless lookup. If such
> lookup fail, you could do the allocation still outside the lock and
> avoiding it in most of cases.

I think that might work, but it would suffer from the slow reclamation
problem I mentioned with RCU. It would also create more complexity in
the code (e.g. the allocation might still turn out to be redundant, so
there would need to be additional code to check for that: the lookup
would essentially have to be done twice in the case of creating a new
RPC). I'd rather not incur this complexity until there's evidence that
GFP_ATOMIC is causing problems.

> > Homa needs to handle a very high rate of RPCs, so this would result in
> > too much accumulated memory  (in particular, skbs don't get reclaimed
> > until the RPC is reclaimed).
>
> For the RPC struct, that above is a fair point, but why skbs need to be
> freed together with the RCP struct? if you have skbs i.e. sitting in a
> RX queue, you can flush such queue when the RPC goes out of scope,
> without any additional delay.

Reclaiming the skbs inline would be too expensive; this is the reason
for the reaping mechanism. It's conceivable that the reaping could be
done in two stages: reap skb's ASAP, but wait to reap homa_rpc structs
until RCU gives the OK . However, once again, this would add more
complexity: it's simpler to have a single reaper that handles
everything.

> > The caller must have a lock on the homa_rpc anyway, so RCU wouldn't
> > save the overhead of acquiring a lock. The reason for putting the lock
> > in the hash table instead of the homa_rpc is that this makes RPC
> > creation/deletion atomic with respect to lookups. The lock was
> > initially in the homa_rpc, but that led to complex races with hash
> > table insertion/deletion. This is explained in sync.txt, but of course
> > you don't have that (yet).
>
> The per bucket RPC lock is prone to contention, a per RPC lock will
> avoid such problem.

There are a lot of buckets (1024); this was done intentionally to
reduce the likelihood of contention between different RPCs  trying to
acquire the same bucket lock.  I was concerned about the potential for
contention, but I have measured it under heavy (admittedly synthetic)
workloads and found that contention for the bucket locks is not a
significant problem.

Note that the bucket locks would be needed even with RCU usage, in
order to permit concurrent RPC creation in different buckets. Thus
Homa's locking scheme doesn't introduce additional locks; it
eliminates locks that would otherwise be needed on individual RPCs and
uses the bucket locks for 2 purposes.

> > This approach is unusual, but it has worked out really well. Before
> > implementing this approach I had what seemed like a never-ending
> > stream of synchronization problems over the socket hash tables; each
> > "fix" introduced new problems. Once I implemented this, all the
> > problems went away and the code has been very stable ever since
> > (several years now).
>
> Have you tried running a fuzzer on this code? I bet syzkaller will give
> a lot of interesting results, if you teach it about the homa APIs.

I haven't done that yet; I'll put it on my "to do" list. I do have
synthetic workloads for Homa that are randomly driven, and so far they
seem to have been pretty effective at finding races.

-John-