lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 20 Feb 2023 16:39:52 +0100
From:   Alexander Lobakin <aleksander.lobakin@...el.com>
To:     Jesper Dangaard Brouer <jbrouer@...hat.com>
CC:     <brouer@...hat.com>, <bpf@...r.kernel.org>,
        <netdev@...r.kernel.org>, Stanislav Fomichev <sdf@...gle.com>,
        <martin.lau@...nel.org>, <ast@...nel.org>, <daniel@...earbox.net>,
        <yoong.siang.song@...el.com>, <anthony.l.nguyen@...el.com>,
        <intel-wired-lan@...ts.osuosl.org>, <xdp-hints@...-project.net>
Subject: Re: [PATCH bpf-next V1] igc: enable and fix RX hash usage by netstack

From: Jesper Dangaard Brouer <jbrouer@...hat.com>
Date: Thu, 16 Feb 2023 17:46:53 +0100

> 
> On 14/02/2023 14.21, Alexander Lobakin wrote:
>> From: Jesper Dangaard Brouer <brouer@...hat.com>
>> Date: Fri, 10 Feb 2023 16:07:59 +0100
>>
>>> When function igc_rx_hash() was introduced in v4.20 via commit
>>> 0507ef8a0372
>>> ("igc: Add transmit and receive fastpath and interrupt handlers"), the
>>> hardware wasn't configured to provide RSS hash, thus it made sense to
>>> not
>>> enable net_device NETIF_F_RXHASH feature bit.
>>
>> [...]
>>
>>> @@ -311,6 +311,58 @@ extern char igc_driver_name[];
>>>   #define IGC_MRQC_RSS_FIELD_IPV4_UDP    0x00400000
>>>   #define IGC_MRQC_RSS_FIELD_IPV6_UDP    0x00800000
>>>   +/* RX-desc Write-Back format RSS Type's */
>>> +enum igc_rss_type_num {
>>> +    IGC_RSS_TYPE_NO_HASH        = 0,
>>> +    IGC_RSS_TYPE_HASH_TCP_IPV4    = 1,
>>> +    IGC_RSS_TYPE_HASH_IPV4        = 2,
>>> +    IGC_RSS_TYPE_HASH_TCP_IPV6    = 3,
>>> +    IGC_RSS_TYPE_HASH_IPV6_EX    = 4,
>>> +    IGC_RSS_TYPE_HASH_IPV6        = 5,
>>> +    IGC_RSS_TYPE_HASH_TCP_IPV6_EX    = 6,
>>> +    IGC_RSS_TYPE_HASH_UDP_IPV4    = 7,
>>> +    IGC_RSS_TYPE_HASH_UDP_IPV6    = 8,
>>> +    IGC_RSS_TYPE_HASH_UDP_IPV6_EX    = 9,
>>> +    IGC_RSS_TYPE_MAX        = 10,
>>> +};
>>> +#define IGC_RSS_TYPE_MAX_TABLE        16
>>> +#define IGC_RSS_TYPE_MASK        0xF
>>
>> GENMASK()?
>>
> 
> hmm... GENMASK(3,0) looks more confusing to me. The mask we need here is
> so simple that I prefer not to complicate this with GENMASK.
> 
>>> +
>>> +/* igc_rss_type - Rx descriptor RSS type field */
>>> +static inline u8 igc_rss_type(union igc_adv_rx_desc *rx_desc)
>>
>> Why use types shorter than u32 on the stack?
> 
> Changing to u32 in V2
> 
>> Why this union is not const here, since there are no modifications?
> 
> Sure
> 
>>> +{
>>> +    /* RSS Type 4-bit number: 0-9 (above 9 is reserved) */
>>> +    return rx_desc->wb.lower.lo_dword.hs_rss.pkt_info &
>>> IGC_RSS_TYPE_MASK;
>>
>> The most important I wanted to mention: doesn't this function make the
>> CPU read the uncached field again, while you could just read it once
>> onto the stack and then extract all such data from there?
> 
> I really don't think this is an issues here. The igc_adv_rx_desc is only
> 16 bytes and it should be hot in CPU cache by now.

Rx descriptors are located in the DMA coherent zone (allocated via
dma_alloc_coherent()), I am missing something? Because I was (I am) sure
CPU doesn't cache anything from it (and doesn't reorder reads/writes
from/to). I thought that's the point of coherent zones -- you may talk
to hardware without needing for syncing...

> 
> To avoid the movzx I have changed this to do a u32 read instead.
> 
>>> +}
>>> +
>>> +/* Packet header type identified by hardware (when BIT(11) is zero).
>>> + * Even when UDP ports are not part of RSS hash HW still parse and
>>> mark UDP bits
>>> + */
>>> +enum igc_pkt_type_bits {
>>> +    IGC_PKT_TYPE_HDR_IPV4    =    BIT(0),
>>> +    IGC_PKT_TYPE_HDR_IPV4_WITH_OPT=    BIT(1), /* IPv4 Hdr includes
>>> IP options */
>>> +    IGC_PKT_TYPE_HDR_IPV6    =    BIT(2),
>>> +    IGC_PKT_TYPE_HDR_IPV6_WITH_EXT=    BIT(3), /* IPv6 Hdr includes
>>> extensions */
>>> +    IGC_PKT_TYPE_HDR_L4_TCP    =    BIT(4),
>>> +    IGC_PKT_TYPE_HDR_L4_UDP    =    BIT(5),
>>> +    IGC_PKT_TYPE_HDR_L4_SCTP=    BIT(6),
>>> +    IGC_PKT_TYPE_HDR_NFS    =    BIT(7),
>>> +    /* Above only valid when BIT(11) is zero */
>>> +    IGC_PKT_TYPE_L2        =    BIT(11),
>>> +    IGC_PKT_TYPE_VLAN    =    BIT(12),
>>> +    IGC_PKT_TYPE_MASK    =    0x1FFF, /* 13-bits */
>>
>> Also GENMASK().
> 
> GENMASK would make more sense here.
> 
>>> +};
>>> +
>>> +/* igc_pkt_type - Rx descriptor Packet type field */
>>> +static inline u16 igc_pkt_type(union igc_adv_rx_desc *rx_desc)
>>
>> Also short types and consts.
>>
> 
> Fixed in V2
> 
>>> +{
>>> +    u32 data = le32_to_cpu(rx_desc->wb.lower.lo_dword.data);
>>> +    /* Packet type is 13-bits - as bits (16:4) in lower.lo_dword*/
>>> +    u16 pkt_type = (data >> 4) & IGC_PKT_TYPE_MASK;
>>
>> Perfect candidate for FIELD_GET(). No, even for le32_get_bits().
> 
> I adjusted this, but I could not find a central define for FIELD_GET
> (but many drivers open code this).

<linux/bitfield.h>. It has FIELD_{GET,PREP}() and also builds
{u,__le,__be}{8,16,32}_{encode,get,replace}_bits() via macro (the latter
doesn't get indexed by Elixir, as it doesn't parse functions built via
macros).

> 
>> Also my note above about excessive expensive reads.
>>
>>> +
>>> +    return pkt_type;
>>> +}
>>> +
>>>   /* Interrupt defines */
>>>   #define IGC_START_ITR            648 /* ~6000 ints/sec */
>>>   #define IGC_4K_ITR            980
>>> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c
>>> b/drivers/net/ethernet/intel/igc/igc_main.c
>>> index 8b572cd2c350..42a072509d2a 100644
>>> --- a/drivers/net/ethernet/intel/igc/igc_main.c
>>> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
>>> @@ -1677,14 +1677,40 @@ static void igc_rx_checksum(struct igc_ring
>>> *ring,
>>>              le32_to_cpu(rx_desc->wb.upper.status_error));
>>>   }
>>>   +/* Mapping HW RSS Type to enum pkt_hash_types */
>>> +struct igc_rss_type {
>>> +    u8 hash_type; /* can contain enum pkt_hash_types */
>>
>> Why make a struct for one field? + short type note
>>
>>> +} igc_rss_type_table[IGC_RSS_TYPE_MAX_TABLE] = {
>>> +    [IGC_RSS_TYPE_NO_HASH].hash_type      = PKT_HASH_TYPE_L2,
>>> +    [IGC_RSS_TYPE_HASH_TCP_IPV4].hash_type      = PKT_HASH_TYPE_L4,
>>> +    [IGC_RSS_TYPE_HASH_IPV4].hash_type      = PKT_HASH_TYPE_L3,
>>> +    [IGC_RSS_TYPE_HASH_TCP_IPV6].hash_type      = PKT_HASH_TYPE_L4,
>>> +    [IGC_RSS_TYPE_HASH_IPV6_EX].hash_type      = PKT_HASH_TYPE_L3,
>>> +    [IGC_RSS_TYPE_HASH_IPV6].hash_type      = PKT_HASH_TYPE_L3,
>>> +    [IGC_RSS_TYPE_HASH_TCP_IPV6_EX].hash_type = PKT_HASH_TYPE_L4,
>>> +    [IGC_RSS_TYPE_HASH_UDP_IPV4].hash_type      = PKT_HASH_TYPE_L4,
>>> +    [IGC_RSS_TYPE_HASH_UDP_IPV6].hash_type      = PKT_HASH_TYPE_L4,
>>> +    [IGC_RSS_TYPE_HASH_UDP_IPV6_EX].hash_type = PKT_HASH_TYPE_L4,
>>> +    [10].hash_type = PKT_HASH_TYPE_L2, /* RSS Type above 9
>>> "Reserved" by HW */
>>> +    [11].hash_type = PKT_HASH_TYPE_L2,
>>> +    [12].hash_type = PKT_HASH_TYPE_L2,
>>> +    [13].hash_type = PKT_HASH_TYPE_L2,
>>> +    [14].hash_type = PKT_HASH_TYPE_L2,
>>> +    [15].hash_type = PKT_HASH_TYPE_L2,
>>
>> Why define those empty if you could do a bound check in the code
>> instead? E.g. `if (unlikely(bigger_than_9)) return PKT_HASH_TYPE_L2`.
> 
> Having a branch for this is likely slower.  On godbolt I see that this
> generates suboptimal and larger code.

But you have to verify HW output anyway, right? Or would like to rely on
that on some weird revision it won't spit BIT(69) on you?

> 
> 
>>> +};
>>> +
>>>   static inline void igc_rx_hash(struct igc_ring *ring,
>>>                      union igc_adv_rx_desc *rx_desc,
>>>                      struct sk_buff *skb)
>>>   {
>>> -    if (ring->netdev->features & NETIF_F_RXHASH)
>>> -        skb_set_hash(skb,
>>> -                 le32_to_cpu(rx_desc->wb.lower.hi_dword.rss),
>>> -                 PKT_HASH_TYPE_L3);
>>> +    if (ring->netdev->features & NETIF_F_RXHASH) {
>>
>>     if (!(feature & HASH))
>>         return;
>>
>> and -1 indent level?
> 
> Usually, yes, I also prefer early return style code.
> For one I just followed the existing style.

I'd not recommend "keep the existing style" of Intel drivers -- not
something I'd like to keep as is :D

> 
> Second, I tried to code it up, but it looks ugly in this case, as the
> variable defines need to get moved outside the if statement.
> 
>>> +        u32 rss_hash = le32_to_cpu(rx_desc->wb.lower.hi_dword.rss);
>>> +        u8  rss_type = igc_rss_type(rx_desc);
>>> +        enum pkt_hash_types hash_type;
>>> +
>>> +        hash_type = igc_rss_type_table[rss_type].hash_type;
>>> +        skb_set_hash(skb, rss_hash, hash_type);
>>> +    }
>>>   }
>>
>> [...]
>>
>> Thanks,
>> Olek
>>
> 

Thanks,
Olek

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ