netdev - Re: [PATCH] netdev: enc28j60 kernel panic fix.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAEGHc+ymc3uY0C0WhjfEo5RoS-bQ7h7f6tbBQpAGEJ=D9rr6FA@mail.gmail.com>
Date:	Fri, 6 May 2016 13:26:10 -0500
From:	David Russell <david@...sworld.com>
To:	Francois Romieu <romieu@...zoreil.com>
Cc:	netdev@...r.kernel.org
Subject: Re: [PATCH] netdev: enc28j60 kernel panic fix.

I kind of thought my patch was at best incomplete.  When you state
this change silences the bug but does not fix it, what are the
implications of systems running this patch?  We have some production
systems using this patch.  They reboot daily, but have been solid.

In addition, if we sent you a pi and the ethernet controller and a
small but reasonable sum of money for your labor, would you be able to
properly fix it?  Short of that, do you have any recommendations on
quick overviews of the networking stack in the kernel and then
documentation on the various flags and such?

Thanks.

-David Russell
APRS World, LLC
http://www.aprsworld.com/


On Thu, May 5, 2016 at 3:51 AM, Francois Romieu <romieu@...zoreil.com> wrote:
> David Russell <david@...sworld.com> :
>> When connected directly to another system (not via a switch)
>> eventually a condition where a NULL pointer dereference occurs in
>> enc28j60_hw_tx() and this patch simply checks for that condition and
>> returns gracefully without causing a kernel panic.  I believe, but
>> have not investigated this is caused by a packet collision and am not
>> sure if the kernel tracks collisions or counts them as errors, so that
>> should probably be added if this is what's happening.  I'm also not
>> familiar with the linux kernel, so may have fixed this in a less than
>> ideal way.
>
> Is it possible for EIR.EIR_TXERIF and EIR.EIR_TXIF to be set for the
> same packet ?
>
> If so the driver is intrinsically racy:
> - EIR.EIR_TXIF completes transmission, clears tx_skb and enables queueing
>   again (see netif_wake_queue in enc28j60_tx_clear)
>
> - insert start_xmit here: tx_skb is set and enc28j60_hw_tx is scheduled
>   for late execution (user context work)
>
> - EIR.EIR_EIR.EIR_TXERIF issues same enc28j60_tx_clear and clears tx_skb
>
> - enc28j60_hw_tx is run but tx_skb is NULL
>
>> diff --git a/drivers/net/ethernet/microchip/enc28j60.c
>> b/drivers/net/ethernet/microchip/enc28j60.c
>> index 86ea17e..36ac65f 100644
>> --- a/drivers/net/ethernet/microchip/enc28j60.c
>> +++ b/drivers/net/ethernet/microchip/enc28j60.c
>> @@ -1233,6 +1233,9 @@ static void enc28j60_irq_work_handler(struct
>> work_struct *work)
>>   */
>>  static void enc28j60_hw_tx(struct enc28j60_net *priv)
>>  {
>> +       if (!priv->tx_skb)
>> +               return;
>> +
>>         if (netif_msg_tx_queued(priv))
>>                 printk(KERN_DEBUG DRV_NAME
>>                         ": Tx Packet Len:%d\n", priv->tx_skb->len);
>
> enc28j60_hw_tx isn't the culprit. It's the victim.
>
> This change silences the bug but it does not fix it at all.
>
> --
> Ueimor