lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <532A1D17.4010701@chelsio.com>
Date:	Wed, 19 Mar 2014 15:41:27 -0700
From:	Casey Leedom <leedom@...lsio.com>
To:	Thadeu Lima de Souza Cascardo <cascardo@...ux.vnet.ibm.com>
CC:	Dimitrios Michailidis <dm@...lsio.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	Hariprasad S <hariprasad@...lsio.com>
Subject: Re: [PATCH] cxgb4: fix probe when already with invalid parameters

   The biggest problem is that just because a PF is MASTER it doesn't 
mean that it can do anything it wants to the adapter.  There may be 
other already registered users (PFs) which are happily going about their 
business.  Being MASTER implies responsibilities:

  1. Initialization if the adapter isn't already initialized.

  2. Handling adapter errors and error interrupts.

Casey

On 03/19/14 15:30, Thadeu Lima de Souza Cascardo wrote:
> On Wed, Mar 19, 2014 at 03:25:42PM -0700, Casey Leedom wrote:
>>    I think I know what's happening.  You mentioned that you're doing
>> a "netboot" so it's probably using PF0..3 for the network boot
>> (depending on the port being booted from) and is failing to issue a
>> BYE command.  When cxgb4 comes along it gets MASTER (because
>> presumably the "netboot" said it didn't want to be MASTER) and the
>> chip/firmware is reported as being Already Initialized.  The
>> "netboot" here seems to be pretty crazy though since the above
>> scenario would require it to explicitly refuse to be MASTER and fail
>> to issue a BYE command.
>>
>>    Which "netboot" are you using and what's the version number, etc.?
>>
> That will take me a while to get answered. I'll see if I can find who
> wrote that code, but it's a driver for Open Firmware on IBM Power
> Systems. I am not really sure this is the cause, but that's what was
> reported to me.
>
> By the way, what exactly are the tons of problems with my patch,
> considering the change I proposed, ie, checking that it's the MASTER?
> What scenarios could it get wrong? Do you suggest any further checking?
>
> Cascardo.
>
>> Casey
>>
>>
>> On 03/19/14 15:13, Casey Leedom wrote:
>>> Okay, that firmware is recent enough but Dimitris is right, the
>>> patch has a ton of problems.  This section of code is very tricky
>>> and we've worked hard to get it right.  The firmware also has
>>> special code in it to ~try to catch~ cases where a previous driver
>>> on the same PF failed to issue a BYE command.  When the firmware
>>> sees a new HELLO command come in from a PF and that PF is already
>>> "registered" (via a previous HELLO command), if it's the _only_ PF
>>> "registered" and the new HELLO command provides the Clear
>>> Initialized flag, then the firmware will assume a missing BYE and
>>> automatically "de-initialize" the firmware/chip.  So it's Very
>>> Weird that you're getting a PF is MASTER and Chip/Firmware is
>>> Initialized return.
>>>
>>> Casey
>>>
>>> On 03/19/14 14:57, Thadeu Lima de Souza Cascardo wrote:
>>>> On Wed, Mar 19, 2014 at 09:38:49PM +0000, Dimitrios Michailidis wrote:
>>>>> Thadeu Lima de Souza Cascardo wrote:
>>>>>> Since commit 636f9d371f70f22961fd598fe18380057518ca31 ("cxgb4: Add
>>>>>> support for T4 configuration file"), we have problems when probing the
>>>>>> device, and finding out it's already initialized, but does not have
>>>>>> valid buffer sizes setup.
>>>>>>
>>>>>> This may happen with kexec without shutdown, or bad firmware or
>>>>>> bootloader.
>>>>>>
>>>>>> The usual symptom is that probe fails:
>>>>>>
>>>>>> [    2.605494] cxgb4 0000:50:00.4: Coming up as MASTER:
>>>>>> Adapter already
>>>>>> initialized
>>>>>> [    2.605511] cxgb4 0000:50:00.4: bad SGE FL page buffer sizes [0, 0]
>>>>>> [    2.625629] cxgb4: probe of 0000:50:00.4 failed with error -22
>>>>>>
>>>>>> The solution is to treat the adapter as not initialized in case the
>>>>>> parameters are invalid.
>>>>> The patch doesn't look right to me.  Besides reinitializing
>>>>> the device when it finds
>>>>> disagreeable settings it disregards that this PF may not be in
>>>>> charge of the device.
>>>>> If the controlling PF (what the code calls master) selects
>>>>> values this PF doesn't like
>>>>> with the patch it will elevate itself to master and install
>>>>> its own preferences.
>>>>>
>>>>> Also not right of course is that FW is claiming the device is
>>>>> initialized when clearly it isn't.
>>>>> Can you tell me which FW version is involved here and what
>>>>> steps got the device in this state?
>>>>>
>>>> We are trying to netboot, so it's possibly a problem on the Open
>>>> Firmware driver that makes it not send FW BYE before handling the CPU to
>>>> the bootloader. I could easily reproduce a similar situation by removing
>>>> the call to t4_fw_bye during the driver remove path, and reloading the
>>>> driver without commit 940d9d34a5467c2e2574866eb009d4cb61e27299 ("cxgb4:
>>>> allow large buffer size to have page size").
>>>>
>>>> The firmware I am using is:
>>>> firmware-version: 1.9.23.0, TP 0.1.9.1
>>>>
>>>> How about the change below?
>>>>
>>>> If that's OK, I'll send a v2.
>>>>
>>>> Cascardo.
>>>>
>>>>
>>>>>> Signed-off-by: Thadeu Lima de Souza Cascardo
>>>>>> <cascardo@...ux.vnet.ibm.com>
>>>>>> ---
>>>>>>   drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |   34
>>>>>> +++++++++++++---------
>>>>>>   1 files changed, 20 insertions(+), 14 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
>>>>>> b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
>>>>>> index 34e2488..d0638f9 100644
>>>>>> --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
>>>>>> +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
>>>>>> @@ -5237,13 +5237,27 @@ static int adap_init0(struct adapter *adap)
>>>>>>        * master initialization), note that we're living with existing
>>>>>>        * adapter parameters.  Otherwise, it's time to try
>>>>>> initializing the
>>>>>>        * adapter ...
>>>>>> +     *
>>>>>> +     * If we're living with non-hard-coded parameters (either from a
>>>>>> +     * Firmware Configuration File or values programmed by
>>>>>> a different PF
>>>>>> +     * Driver), give the SGE code a chance to pull in
>>>>>> anything that it
>>>>>> +     * needs ...  Note that this must be called after we
>>>>>> retrieve our VPD
>>>>>> +     * parameters in order to know how to convert core
>>>>>> ticks to seconds.
>>>>>>        */
>>>>>>       if (state == DEV_STATE_INIT) {
>>>>>>           dev_info(adap->pdev_dev, "Coming up as %s: "\
>>>>>>                "Adapter already initialized\n",
>>>>>>                adap->flags & MASTER_PF ? "MASTER" : "SLAVE");
>>>>>>           adap->flags |= USING_SOFT_PARAMS;
>>>>>> -    } else {
>>>>>> +        ret = t4_sge_init(adap);
>>>>>> +        if (ret == -EINVAL) {
>>>> -        if (ret == -EINVAL) {
>>>> +        if (ret == -EINVAL && adap->flags & MASTER_PF) {
>>>>
>>>>>> +            adap->flags &= ~USING_SOFT_PARAMS;
>>>>>> +            state = DEV_STATE_UNINIT;
>>>>>> +        } else if (ret < 0) {
>>>>>> +            goto bye;
>>>>>> +        }
>>>>>> +    }
>>>>>> +    if (state != DEV_STATE_INIT) {
>>>>>>           dev_info(adap->pdev_dev, "Coming up as MASTER: "\
>>>>>>                "Initializing adapter\n");
>>>>>>
>>>>>> @@ -5300,19 +5314,11 @@ static int adap_init0(struct adapter *adap)
>>>>>>                   -ret);
>>>>>>               goto bye;
>>>>>>           }
>>>>>> -    }
>>>>>> -
>>>>>> -    /*
>>>>>> -     * If we're living with non-hard-coded parameters (either from a
>>>>>> -     * Firmware Configuration File or values programmed by
>>>>>> a different PF
>>>>>> -     * Driver), give the SGE code a chance to pull in
>>>>>> anything that it
>>>>>> -     * needs ...  Note that this must be called after we
>>>>>> retrieve our VPD
>>>>>> -     * parameters in order to know how to convert core
>>>>>> ticks to seconds.
>>>>>> -     */
>>>>>> -    if (adap->flags & USING_SOFT_PARAMS) {
>>>>>> -        ret = t4_sge_init(adap);
>>>>>> -        if (ret < 0)
>>>>>> -            goto bye;
>>>>>> +        if (adap->flags & USING_SOFT_PARAMS) {
>>>>>> +            ret = t4_sge_init(adap);
>>>>>> +            if (ret < 0)
>>>>>> +                goto bye;
>>>>>> +        }
>>>>>>       }
>>>>>>
>>>>>>       if (is_bypass_device(adap->pdev->device))
>>>>>> -- 
>>>>>> 1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ