lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1fee07fd-3beb-4201-9575-5ad630386e2f@canonical.com>
Date: Thu, 20 Jun 2024 17:59:42 +0300
From: Ghadi Rahme <ghadi.rahme@...onical.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: netdev@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH v2 net] bnx2x: Fix multiple UBSAN
 array-index-out-of-bounds


On 13/06/2024 17:48, Jakub Kicinski wrote:
> On Wed, 12 Jun 2024 18:44:49 +0300 Ghadi Elie Rahme wrote:
>> Fix UBSAN warnings that occur when using a system with 32 physical
>> cpu cores or more, or when the user defines a number of Ethernet
>> queues greater than or equal to FP_SB_MAX_E1x using the num_queues
>> module parameter.
>>
>> The value of the maximum number of Ethernet queues should be limited
>> to FP_SB_MAX_E1x in case FCOE is disabled or to [FP_SB_MAX_E1x-1] if
>> enabled to avoid out of bounds reads and writes.
> You're just describing what the code does, not providing extra
> context...

Apologies for the lack of explanation.

Currently there is a read/write out of bounds that occurs on the array
"struct stats_query_entry query" present inside the "bnx2x_fw_stats_req"
struct in "drivers/net/ethernet/broadcom/bnx2x/bnx2x.h".
Looking at the definition of the "struct stats_query_entry query" array:

struct stats_query_entry query[FP_SB_MAX_E1x+
         BNX2X_FIRST_QUEUE_QUERY_IDX];

FP_SB_MAX_E1x is defined as the maximum number of fast path interrupts and
has a value of 16, while BNX2X_FIRST_QUEUE_QUERY_IDX has a value of 3
meaning the array has a total size of 19.
Since accesses to "struct stats_query_entry query" are offset-ted by
BNX2X_FIRST_QUEUE_QUERY_IDX, that means that the total number of Ethernet
queues should not exceed FP_SB_MAX_E1x (16). However one of these queues
is reserved for FCOE and thus the number of Ethernet queues should be set
to [FP_SB_MAX_E1x -1] (15) if FCOE is enabled or [FP_SB_MAX_E1x] (16) if
it is not.

This is also described in a comment in the source code in
drivers/net/ethernet/broadcom/bnx2x/bnx2x.h just above the Macro definition
of FP_SB_MAX_E1x. Below is the part of this explanation that it important
for this patch

/*
  * The total number of L2 queues, MSIX vectors and HW contexts (CIDs) is
  * control by the number of fast-path status blocks supported by the
  * device (HW/FW). Each fast-path status block (FP-SB) aka non-default
  * status block represents an independent interrupts context that can
  * serve a regular L2 networking queue. However special L2 queues such
  * as the FCoE queue do not require a FP-SB and other components like
  * the CNIC may consume FP-SB reducing the number of possible L2 queues
  *
  * If the maximum number of FP-SB available is X then:
  * a. If CNIC is supported it consumes 1 FP-SB thus the max number of
  *    regular L2 queues is Y=X-1
  * b. In MF mode the actual number of L2 queues is Y= (X-1/MF_factor)
  * c. If the FCoE L2 queue is supported the actual number of L2 queues
  *    is Y+1
  * d. The number of irqs (MSIX vectors) is either Y+1 (one extra for
  *    slow-path interrupts) or Y+2 if CNIC is supported (one additional
  *    FP interrupt context for the CNIC).
  * e. The number of HW context (CID count) is always X or X+1 if FCoE
  *    L2 queue is supported. The cid for the FCoE L2 queue is always X.
  */

Looking at the commits when the E2 support was added, it was originally
using the E1x parameters [f2e0899f0f27 (bnx2x: Add 57712 support)]. Where
FP_SB_MAX_E2 was set to 16 the same as E1x. Since I do not have access to
the datasheets of these devices I had to guess based on the previous work
done on the driver what would be the safest way to fix this array overflow.
Thus I decided to go with how things were done before, which is to limit
the E2 to using the same number of queues as E1x. This patch accomplishes
that.

However I also had another solution which made more sense to me but I had
no way to tell if it would be safe. The other solution was to increase the
size of the stats_query_entry query array to be large enough to fit the
number of queues supported by E2. This would mean that the new definition
would look like the following:

struct stats_query_entry query[FP_SB_MAX_E2+
         BNX2X_FIRST_QUEUE_QUERY_IDX];

I have tested this approach and it worked fine so I am more comfortable now
changing the patch an sending in a v3 undoing the changes in v2 and simply
increasing the array size. I believe now that using FP_SB_MAX_E1x instead
of FP_SB_MAX_E2 to define the array size might have been an oversight when
updating the driver to take full advantage of the E2 after it was just
limiting itself to the capabilities of an E1x.

>
>> Fixes: 7d0445d66a76 ("bnx2x: clamp num_queues to prevent passing a negative value")
> Sure this is not more recent, netif_get_num_default_rss_queues()
> used to always return 8.
The value of the number of queues can be defined by the kernel or the
user, which is why I used the commit that I did for the Fixes tag
because it is the job of the clamp to make sure both these values are
in check. Setting the Fixes tag to when netif_get_num_default_rss_queues()
was changed ignores the fact that the user value can be out of bounds.
>> Signed-off-by: Ghadi Elie Rahme <ghadi.rahme@...onical.com>
>> Cc: stable@...r.kernel.org
>>   drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 7 ++++++-
>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
>> index a8e07e51418f..c895dd680cf8 100644
>> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
>> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
>> @@ -66,7 +66,12 @@ static int bnx2x_calc_num_queues(struct bnx2x *bp)
>>   	if (is_kdump_kernel())
>>   		nq = 1;
>>   
>> -	nq = clamp(nq, 1, BNX2X_MAX_QUEUES(bp));
>> +	int max_nq = FP_SB_MAX_E1x - 1;
> please don't mix declarations and code
>
>> +	if (NO_FCOE(bp))
>> +		max_nq = FP_SB_MAX_E1x;
> you really need to explain somewhere why you're hardcoding E1x
> constants while at a glance the driver also supports E2.
> Also why is BNX2X_MAX_QUEUES() higher than the number of queues?
> Isn't that the bug?
The reason I did not patch BNX2X_MAX_QUEUES() is because the macro is
working as expected by returning the actual number of queues that can be
handled by a NIC using an E2/E1x chip. It was the driver that was not able
to handle the maximum an E2 NIC can take.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ