netdev - RE: [GIT]: Networking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1FC56210173BB445BD77F608D6FB8D03036AA92EEA@HQMAIL03.nvidia.com>
Date:	Thu, 2 Jul 2009 18:11:01 -0700
From:	Ayaz Abdulla <AAbdulla@...dia.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
CC:	David Miller <davem@...emloft.net>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: [GIT]: Networking

Our hw only runs on architectures that provide coherency. i.e x86

I'm fine with keeping it with both napi_schedule_prep() and __napi_schedule() if you want it to be similar to other drivers.

Thanks!
________________________________________
From: Eric Dumazet [eric.dumazet@...il.com]
Sent: Thursday, July 02, 2009 5:07 PM
To: Ayaz Abdulla
Cc: David Miller; netdev@...r.kernel.org
Subject: Re: [GIT]: Networking

Ayaz Abdulla a écrit :
>
>
> Eric Dumazet wrote:
>> Ayaz Abdulla a écrit :
>>
>>>
>>> Eric Dumazet wrote:
>>>
>>>> Eric Dumazet a écrit :
>>>>
>>>>
>>>>> Ingo Molnar a écrit :
>>>>>
>>>>>
>>>>>>> The following changes since commit
>>>>>>> 52989765629e7d182b4f146050ebba0abf2cb0b7:
>>>>>>> Linus Torvalds (1):
>>>>>>>       Merge git://git.kernel.org/.../davem/net-2.6
>>>>>>>
>>>>>>> are available in the git repository at:
>>>>>>>
>>>>>>> master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.git master
>>>>>>
>>>>>> Hm, something in this lot quickly wrecked networking here - see the
>>>>>> tx timeout dump below. It starts with:
>>>>>>
>>>>>> [  351.004596] WARNING: at net/sched/sch_generic.c:246
>>>>>> dev_watchdog+0x10b/0x19c()
>>>>>> [  351.011815] Hardware name: System Product Name
>>>>>> [  351.016220] NETDEV WATCHDOG: eth0 (forcedeth): transmit queue 0
>>>>>> timed out
>>>>>>
>>>>>> Config attached. Unfortunately i've got no time to do bisection
>>>>>> today.
>>>>>
>>>>>
>>>>>
>>>>> forcedeth might have a problem, in its netif_wake_queue() logic, but
>>>>> I could not see why a recent patch could make this problem visible
>>>>> now.
>>>>>
>>>>> CPU0/1: AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ stepping 02
>>>>> is not a new cpu either :)
>>>>>
>>>>> forcedeth uses an internal tx_stop without appropriate barrier.
>>>>>
>>>>> Could you try following patch ?
>>>>>
>>>>> (random guess as I dont have much time right now)
>>>>
>>>>
>>>> Oh well this patch was soooo stupid, sorry Ingo.
>>>>
>>>>
>>>> We might have a race in napi_schedule(), leaving interrupts disabled
>>>> forever.
>>>> I cannot test this patch, I dont have the hardware...
>>>>
>>>> Thanks
>>>>
>>>> diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
>>>> index 1094d29..3b4e076 100644
>>>> --- a/drivers/net/forcedeth.c
>>>> +++ b/drivers/net/forcedeth.c
>>>> @@ -3514,11 +3514,13 @@ static irqreturn_t nv_nic_irq(int foo, void
>>>> *data)
>>>>     nv_msi_workaround(np);
>>>>
>>>> #ifdef CONFIG_FORCEDETH_NAPI
>>>> -    napi_schedule(&np->napi);
>>>> -
>>>> -    /* Disable furthur irq's
>>>> -       (msix not enabled with napi) */
>>>> -    writel(0, base + NvRegIrqMask);
>>>> +    if (napi_schedule_prep(&np->napi)) {
>>>> +        /*
>>>> +         * Disable further irq's (msix not enabled with napi)
>>>> +         */
>>>> +        writel(0, base + NvRegIrqMask);
>>>> +        __napi_schedule(&np->napi);
>>>> +    }
>>>
>>> Yes, good catch. There is a race condition here with napi poll.
>>>
>>> I would prefer to do the following to keep the code simple and clean.
>>>
>>> writel(0, base + NvRegIrqMask);
>>> napi_schedule(&np->napi);
>>
>>
>>
>> CC trimmed down to network devs only :)
>>
>> It would be racy too ...
>>
>> check drivers/net/amd8111e.c, drivers/net/natsemi.c ...
>>
>> If this cpu inconditionaly calls writel(0, base + NvRegIrqMask); while
>> another cpu just called writel(np->irqmask, base + NvRegIrqMask),
>> we end with disabled interrupts ?
> <BCC linux-kernel>
>
> Yes, but the next instruction is to call napi_schedule() which will
> re-enable interrupts in napi function.

Sorry, I dont get it. You are referring to softirqs, while I speak
of hardware (NIC) interrupts, that can be masked for ever.

With your patch :

CPU0                                     CPU1
napi_complete();
/* please note there is no smp_wmb()
   in __napi_complete() after
   clear_bit(NAPI_STATE_SCHED, &n->state);
   (It's implied on x86 because of lock prefix)
*/
writel(np->irqmask, base + NvRegIrqMask)
                                         writel(0, base + NvRegIrqMask);
                                         napi_schedule(&np->napi); // might do nothing because this cpu see
                                                                   // NAPI_STATE_SCHED set to one

NIC cannot deliver further interrupts.


With my patch :

CPU0                                     CPU1
napi_complete();
/* please note there is no smp_wmb()
   in __napi_complete() after
   clear_bit(NAPI_STATE_SCHED, &n->state);
*/
writel(np->irqmask, base + NvRegIrqMask)
                                          if (napi_schedule_prep(&np->napi)) // if false, NIC can deliver further interrupts.
                                                // if true, we mask interrupts but re-enable napi. packets will be processed,
                                                // interrupt will be re-enabled later.


My patch has a guarantee that we mask NIC interrupts *only* if napi is re-scheduled.

It probably makes no difference on x86, but it might be better to have same logic in all drivers...
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information.  Any unauthorized review, use, disclosure or distribution
is prohibited.  If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html