[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <4B33EE40.1020402@majjas.com>
Date: Thu, 24 Dec 2009 17:42:08 -0500
From: Michael Breuer <mbreuer@...jas.com>
To: Stephen Hemminger <shemminger@...tta.com>
Cc: Daniel Hazelton <dhazelton@...er.net>,
Berck Nash <flyboy@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
netdev@...r.kernel.org
Subject: Re: sky2 panic in 2.6.32.1 under load
On 12/24/2009 5:21 PM, Stephen Hemminger wrote:
> On Thu, 24 Dec 2009 11:28:57 -0500
> Daniel Hazelton<dhazelton@...er.net> wrote:
>
>
>> On Thursday 24 December 2009 11:03:56 am Berck Nash wrote:
>>
>>> Andrew Morton wrote:
>>>
>>>> On Mon, 21 Dec 2009 16:52:10 -0700 "Berck E. Nash"<flyboy@...il.com>
>>>>
>> wrote:
>>
>>>>> Since 2.6.32, I've been getting kernel panics under heavy network load
>>>>> (bittorrent usage).
>>>>>
>>>> Let's cc the right list and developer.
>>>>
>>>> This is a 2.6.31->2.6.32 regression?
>>>>
>>> I believe so. Since it's intermittent and difficult to reproduce, it's
>>> possible (but unlikely) that I simply never triggered it under 2.6.31.
>>>
>> This is far from new. I have seen this under 2.6.27 when at least one botnet
>> has been pointed at a server of mine and told to gain access. It has happened
>> four times in the last six to eight months - and I have no easy way to capture
>> the logs. But the oops that was posted looks very, very similar to what I've
>> seen.
>>
>> It's always an allocation error in the transmit path that leads to the panic.
>> Because this is a production machine that I do not have a way to take down and
>> do testing with I've not reported the problem before.
>>
>>
> Even though I wrote/maintain the sky driver, I don't work for SysKonnect, and
> only have access to a limited set of information:
> the technical manuals (under NDA), and the vendor sk98lin driver. The sky2 driver
> imitates the receiver timeout of the sk98lin driver; other people have told me
> that the FIFO hardware implementation is buggy and when it gets full, it gets
> stuck. Probably the equivalent of a software FIFO where the developer forgets to
> reserve a slot so that head == tail can mean both empty and full!
>
> The workaround with a timer is prone to errors when traffic keeps going, also
> the vendor doesn't really provide clear instructions on how to unlock it. I do
> not have access to the hardware errata describing the problem. If I did a more
> minimal solution would be possible.
>
> The easiest advice is avoid sky2 chips with FIFO for any heavy traffic, the next
> advice is make sure receive flow control is enabled so that receiver doesn't
> get overrun. If tx timeouts are an issue use a rate limiter like TBF. Do not use
> the chip with 10 or 100 mbit since the transmitter is more prone to get overrun.
>
For this particular issue, I'm only seeing problems when running at 1000
mbit. 100 appears stable.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists