linux-kernel - Re: [PATCH net-next v5 0/4] Mitigate the two-reallocations issue for iptunnels

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e17b4ac0-2534-45c1-92fc-74305d8ead82@uliege.be>
Date: Tue, 19 Nov 2024 23:30:13 +0100
From: Justin Iurman <justin.iurman@...ege.be>
To: netdev@...r.kernel.org
Cc: davem@...emloft.net, dsahern@...nel.org, edumazet@...gle.com,
 kuba@...nel.org, pabeni@...hat.com, horms@...nel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next v5 0/4] Mitigate the two-reallocations issue for
 iptunnels

On 11/19/24 23:21, Justin Iurman wrote:
> v5:
> - address Paolo's comments
> - s/int dst_dev_overhead()/unsigned int dst_dev_overhead()/
> v4:
> - move static inline function to include/net/dst.h
> v3:
> - fix compilation error in seg6_iptunnel
> v2:
> - add missing "static" keywords in seg6_iptunnel
> - use a static-inline function to return the dev overhead (as suggested
>    by Olek, thanks)
> 
> The same pattern is found in ioam6, rpl6, and seg6. Basically, it first
> makes sure there is enough room for inserting a new header:
> 
> (1) err = skb_cow_head(skb, len + skb->mac_len);
> 
> Then, when the insertion (encap or inline) is performed, the input and
> output handlers respectively make sure there is enough room for layer 2:
> 
> (2) err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
> 
> skb_cow_head() does nothing when there is enough room. Otherwise, it
> reallocates more room, which depends on the architecture. Briefly,
> skb_cow_head() calls __skb_cow() which then calls pskb_expand_head() as
> follows:
> 
> pskb_expand_head(skb, ALIGN(delta, NET_SKB_PAD), 0, GFP_ATOMIC);
> 
> "delta" represents the number of bytes to be added. This value is
> aligned with NET_SKB_PAD, which is defined as follows:
> 
> NET_SKB_PAD = max(32, L1_CACHE_BYTES)
> 
> ... where L1_CACHE_BYTES also depends on the architecture. In our case
> (x86), it is defined as follows:
> 
> L1_CACHE_BYTES = (1 << CONFIG_X86_L1_CACHE_SHIFT)
> 
> ... where (again, in our case) CONFIG_X86_L1_CACHE_SHIFT equals 6
> (=X86_GENERIC).
> 
> All this to say, skb_cow_head() would reallocate to the next multiple of
> NET_SKB_PAD (in our case a 64-byte multiple) when there is not enough
> room.
> 
> Back to the main issue with the pattern: in some cases, two
> reallocations are triggered, resulting in a performance drop (i.e.,
> lines (1) and (2) would both trigger an implicit reallocation). How's
> that possible? Well, this is kind of bad luck as we hit an exact
> NET_SKB_PAD boundary and when skb->mac_len (=14) is smaller than
> LL_RESERVED_SPACE(dst->dev) (=16 in our case). For an x86 arch, it
> happens in the following cases (with the default needed_headroom):
> 
> - ioam6:
>   - (inline mode) pre-allocated data trace of 236 or 240 bytes
>   - (encap mode) pre-allocated data trace of 196 or 200 bytes
> - seg6:
>   - (encap mode) for 13, 17, 21, 25, 29, 33, ...(+4)... prefixes
> 
> Let's illustrate the problem, i.e., when we fall on the exact
> NET_SKB_PAD boundary. In the case of ioam6, for the above problematic
> values, the total overhead is 256 bytes for both modes. Based on line
> (1), skb->mac_len (=14) is added, therefore passing 270 bytes to
> skb_cow_head(). At that moment, the headroom has 206 bytes available (in
> our case). Since 270 > 206, skb_cow_head() performs a reallocation and
> the new headroom is now 206 + 64 (NET_SKB_PAD) = 270. Which is exactly
> the room we needed. After the insertion, the headroom has 0 byte
> available. But, there's line (2) where 16 bytes are still needed. Which,
> again, triggers another reallocation.
> 
> The same logic is applied to seg6 (although it does not happen with the
> inline mode, i.e., -40 bytes). It happens with other L1 cache shifts too
> (the larger the cache shift, the less often it happens). For example,
> with a +32 cache shift (instead of +64), the following number of
> segments would trigger two reallocations: 11, 15, 19, ... With a +128
> cache shift, the following number of segments would trigger two
> reallocations: 17, 25, 33, ... And so on and so forth. Note that it is
> the same for both the "encap" and "l2encap" modes. For the "encap.red"
> and "l2encap.red" modes, it is the same logic but with "segs+1" (e.g.,
> 14, 18, 22, 26, etc for a +64 cache shift). Note also that it may happen
> with rpl6 (based on some calculations), although it did not in our case.
> 
> This series provides a solution to mitigate the aforementioned issue for
> ioam6, seg6, and rpl6. It provides the dst_entry (in the cache) to
> skb_cow_head() **before** the insertion (line (1)). As a result, the
> very first iteration would still trigger two reallocations (i.e., empty
> cache), while next iterations would only trigger a single reallocation.
> 
> Justin Iurman (4):
>    include: net: add static inline dst_dev_overhead() to dst.h
>    net: ipv6: ioam6_iptunnel: mitigate 2-realloc issue
>    net: ipv6: seg6_iptunnel: mitigate 2-realloc issue
>    net: ipv6: rpl_iptunnel: mitigate 2-realloc issue
> 
>   include/net/dst.h         |  9 +++++
>   net/ipv6/ioam6_iptunnel.c | 73 ++++++++++++++++-----------------
>   net/ipv6/rpl_iptunnel.c   | 46 +++++++++++----------
>   net/ipv6/seg6_iptunnel.c  | 85 ++++++++++++++++++++++++---------------
>   4 files changed, 123 insertions(+), 90 deletions(-)
> 

Sorry, I just noticed I missed net-next window. Will resubmit when it 
reopens.