netdev - Alternate location for skb_shared

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <e6f779c3-9af0-5200-013b-f4da94c2ae38@gmail.com>
Date:   Sat, 15 Aug 2020 09:57:27 -0700
From:   Florian Fainelli <f.fainelli@...il.com>
To:     netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>, edumazet@...gle.com
Cc:     Doug Berger <opendmb@...il.com>,
        Justin Chen <justin.chen@...adcom.com>
Subject: Alternate location for skb_shared_info other than the tail?

Hi all,

We have an Ethernet controller that is capable of putting several 
Ethernet frames within a single 4KB buffer provided by Linux. The 
rationale for this design is to maximize the DRAM efficiency, especially 
for LPDDR4/5 architectures where the cost to keep a page open is higher 
than say DDR3/4.

We have a programmable alignment which allows us to make sure that we 
can align the start of the Ethernet frames on a cache line boundary (and 
also add the 2 byte stuffing to further align the IP header). What we do 
not have however is a programmable tail room to space out the Ethernet 
frames between one another. Worst case, if the end aligns on a 64b 
boundary, the next frame would start on the adjacent byte, leaving no 
space in between.

We were initially thinking about using build_skb() (and variants) and 
point the data argument to the location within that 4KB buffer, however 
this causes a problem that we have the skb_shared_info structure at the 
end of the Ethernet frame, and that area can be overwritten by the 
hardware. Right now we allocate a new sk_buff and copy from the offset 
within the 4KB buffer. The CPU is fast enough and this warms up the data 
cache that this is not a performance problem, but we would prefer to do 
without the copy to limit the amount of allocations in NAPI context.

What is further complicating is that by the time we process one Ethernet 
frame within the 4KB data buffer, we do not necessarily know whether 
another one will come, and what space we would have between the two. If 
we do know, though, we could see if we have enough room for the 
skb_shared_info and at least avoid copying that sk_buff, but this would 
not be such a big win.

We are unfortunately a bit late to fix that hardware design limitation, 
and we did not catch the requirement for putting skb_shared_info at the 
end until too late :/

In premise skb_shared_info could reside somewhere other than at the 
tail, and the entire network stack uses the skb_shinfo() to access the 
area (there are no concerns about "wild" accesses to skb_shared_info), 
so if we could find a way to encode within the sk_buff that an alternate 
location is to be used, we could get our cookie. There are some parts of 
the stack that do however assume that we need to allocate the header 
part plus the sbk_shared_info structure, those assumptions may be harder 
to break.

Do you have any suggestions on how we could specify an alternate 
location for skb_shared_info or any other suggestions on how to avoid 
copies?

Thanks!
-- 
Florian