[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7c7fc244-012c-7760-a62e-7c31242d489a@ovn.org>
Date: Mon, 15 May 2023 20:23:28 +0200
From: Ilya Maximets <i.maximets@....org>
To: Antoine Tenart <atenart@...nel.org>, Dumitru Ceara <dceara@...hat.com>,
Eric Dumazet <edumazet@...gle.com>
Cc: i.maximets@....org, davem@...emloft.net, kuba@...nel.org,
pabeni@...hat.com, netdev@...r.kernel.org
Subject: Re: [PATCH net-next 4/4] net: skbuff: fix l4_hash comment
On 5/15/23 10:12, Antoine Tenart wrote:
> Quoting Dumitru Ceara (2023-05-11 22:50:32)
>> On 5/11/23 19:54, Ilya Maximets wrote:
>>>>> Note that skb->hash has never been considered as canonical, for obvious reasons.
>>>
>>> I guess, the other point here is that it's not an L4 hash either.
>>>
>>> It's a random number. So, the documentation will still not be
>>> correct even after the change proposed in this patch.
>
> The proposed changed is "indicate hash is from layer 4 and provides a
> uniform distribution over flows", which does not describe *how* the hash
> is computed but *where* it comes from. This matches "random number set
> by TCP" and changes in how hashes are computed won't affect the comment,
> so we'll not end up in the same situation.
I respectfully disagree, "is from layer 4" and "random number" do not
match for me. So, "where it comes from" argument is not applicable.
Random numbers come from random number generator, and not "from layer 4".
Unless by "from layer 4" you mean "from the code that handles layer 4
packet processing". But that seems very confusing to me. And it is
definitely not the first thing that comes to mind while reading the
documentation.
>
>>> One way to not break everything doing that will be to introduce a
>>> new flag, e.g. 'rnd_hash' that will be a hash that is "not related
>>> to packet fields, but provides a uniform distribution over flows".
>>>
>>> skb_get_hash() then may return the current hash if it's any of
>>> l4, rnd or sw. That should preserve the current logic across
>>> the kernel code.
>>> But having a new flag, we could introduce a new helper, for example
>>> skb_get_stable_hash() or skb_get_hash_nonrandom() or something like
>>> that, that will be equal to the current version of skb_get_hash(),
>>> i.e. not take the random hash into account.
>>>
>>> Affected subsystems (OVS, ECMP, SRv6) can be changed to use that
>>> new function. This way these subsystems will get a software hash
>>> based on the real packet fields, if it was originally random.
>>> This will also preserve ability to use hash provided by the HW,
>>> since it is not normally random.
>
> But then the whole point of txrehash would be dismissed, if ECMP and
> others stop using the hash provided by TCP.
I guess ECMP is not a good example as it doesn't use skb hash for locally
generated traffic. However, the argument about defeating the purpose of
rehash doesn't stand either for the same reason. The same next hop will
be chosen for the same flow always, because the hash will be re-generated
from a 5-tuple. But anyway...
In OVS and SRv6 cases we're also talking about load balancing, so unlike
ECMP, final destinations may be different based on the hash. In this case
there is an issue.
> This needs to be a
> conditional setting, to make the skb hash to be stable over time only
> when needed. That way both scenario are supported.
>
>> What I had in mind is not really a stable hash but a "good enough
>> alternative". It's probably "good enough" (at least for OvS/OVN) if the
>> hash used by OvS doesn't change throughout the lifetime of a TCP session.
>
> So what's important is not how the hash is computed but the fact that
> it should be stable over time when requested. Isn't exactly what
> net.core.txrehash=0 does? If there are some bugs they should be fixed.
Not asking to disable re-hash. Asking to provide a way to distinguish
changing random numbers from a more stable hash, so we can avoid
recalculating it on every packet.
Of course, we can change the code to re-calculate every time, but that
sounds wasteful if it's already done by the HW, for example.
> On top of this, if OvS needs to additionally provide a canonical
> 4/5-tuple hash because not only the stability over time is needed but
> also the method is important, it needs to compute its own hash
For the OVS_HASH_ALG_L4, the exact method is not really important.
For OVS_HASH_ALG_SYM_L4, the hash has to symmetric, but this hashing
algorithm is not implemented in the OVS kernel module today.
> As part
> of such potential series ways to cache the result can be explored.
> Numbers would help too. (This can be discussed here, that's fine, but I
> thought it's important to distinguish the two topics).
I agree that topics are fairly different. The rest of the set is probably
fine (I didn't review). I'm just not sure why we need to change the comment
from one incorrect sentence to another similarly incorrect and confusing.
Also making it look like subsystems that use it in a previously documented
meaning are at fault. It's just not fair. I think, the issue should be
fixed first.
On the other note,
I don't think that the rest of the patch set should be held back by that
though. So, maybe dropping this one patch from the set might be an option
for now?
Best regards, Ilya Maximets.
Powered by blists - more mailing lists