netdev - Re: Alternate location for skb_shared

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CANn89i+iHPssBAkaucdZukV=XxdoVWoYDm4LGEO+E49sXmDi9Q@mail.gmail.com>
Date:   Sat, 15 Aug 2020 10:34:26 -0700
From:   Eric Dumazet <edumazet@...gle.com>
To:     Florian Fainelli <f.fainelli@...il.com>
Cc:     netdev <netdev@...r.kernel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Doug Berger <opendmb@...il.com>,
        Justin Chen <justin.chen@...adcom.com>
Subject: Re: Alternate location for skb_shared_info other than the tail?

On Sat, Aug 15, 2020 at 9:57 AM Florian Fainelli <f.fainelli@...il.com> wrote:
>
> Hi all,
>
> We have an Ethernet controller that is capable of putting several
> Ethernet frames within a single 4KB buffer provided by Linux. The
> rationale for this design is to maximize the DRAM efficiency, especially
> for LPDDR4/5 architectures where the cost to keep a page open is higher
> than say DDR3/4.
>
> We have a programmable alignment which allows us to make sure that we
> can align the start of the Ethernet frames on a cache line boundary (and
> also add the 2 byte stuffing to further align the IP header). What we do
> not have however is a programmable tail room to space out the Ethernet
> frames between one another. Worst case, if the end aligns on a 64b
> boundary, the next frame would start on the adjacent byte, leaving no
> space in between.
>
> We were initially thinking about using build_skb() (and variants) and
> point the data argument to the location within that 4KB buffer, however
> this causes a problem that we have the skb_shared_info structure at the
> end of the Ethernet frame, and that area can be overwritten by the
> hardware. Right now we allocate a new sk_buff and copy from the offset
> within the 4KB buffer. The CPU is fast enough and this warms up the data
> cache that this is not a performance problem, but we would prefer to do
> without the copy to limit the amount of allocations in NAPI context.
>
> What is further complicating is that by the time we process one Ethernet
> frame within the 4KB data buffer, we do not necessarily know whether
> another one will come, and what space we would have between the two. If
> we do know, though, we could see if we have enough room for the
> skb_shared_info and at least avoid copying that sk_buff, but this would
> not be such a big win.
>
> We are unfortunately a bit late to fix that hardware design limitation,
> and we did not catch the requirement for putting skb_shared_info at the
> end until too late :/
>
> In premise skb_shared_info could reside somewhere other than at the
> tail, and the entire network stack uses the skb_shinfo() to access the
> area (there are no concerns about "wild" accesses to skb_shared_info),
> so if we could find a way to encode within the sk_buff that an alternate
> location is to be used, we could get our cookie. There are some parts of
> the stack that do however assume that we need to allocate the header
> part plus the sbk_shared_info structure, those assumptions may be harder
> to break.
>
> Do you have any suggestions on how we could specify an alternate
> location for skb_shared_info or any other suggestions on how to avoid
> copies?

It seems you do _not_ want to use build_skb(), but instead allocate a
standard skb
with enough room in skb->head to pull the headers.

build_skb was fine for MTU=1500 and using 2KB pre-allocated buffers,
because we could use the
2048-1536 = 512 bytes as a placeholder for shared_info.

But for non conventional MTU or buffer splitting (as in your case),
there is no point for using this.