[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7842CAE4393A4005827A88307EB4E83B@909927SOSLA>
Date: Tue, 23 Sep 2008 14:59:47 -0600
From: "Brian Rademacher" <rad@...files.net>
To: "Gwendal Grignou" <gwendal@...gle.com>,
"Justin Piszcz" <jpiszcz@...idpixels.com>
Cc: <linux-ide@...r.kernel.org>, <linux-raid@...r.kernel.org>,
<linux-kernel@...r.kernel.org>
Subject: Re: exception Emask 0x0 SAct 0x1 / SErr 0x0 action 0x2 frozen
I disabled NCQ and same thing...Just says DMA freeze instead of NCQ
freeze...
----- Original Message -----
From: "Gwendal Grignou" <gwendal@...gle.com>
To: "Justin Piszcz" <jpiszcz@...idpixels.com>
Cc: "Brian Rademacher" <rad@...files.net>; <linux-ide@...r.kernel.org>;
<linux-raid@...r.kernel.org>; <linux-kernel@...r.kernel.org>
Sent: Tuesday, September 23, 2008 12:14 PM
Subject: Re: exception Emask 0x0 SAct 0x1 / SErr 0x0 action 0x2 frozen
> About ata1:0 problem, as reported in the bugzilla bug: I would try to
> disable NCQ to see if it helps. Your disks firmware might not fully
> support it.
>
> You can either add the parameter "libata.force=noncq" when loading
> your kernel, or set queue_depth to 1 for all the Seagate drives behind
> the Marvell MV88SX6081 controller.
>
> About ata5:0 , someone - in user space probably - is trying to do a
> SMART ENABLE operation, but the device ignores it. I don't know which
> device you are using, but I assume it does not support ATA SMART
> feature set. Timeout is an acceptable but not a nice way to answer, a
> cancel would have been better; check if there is a firmware upgrade
> for your device.
>
> Gwendal.
>
> On Mon, Sep 22, 2008 at 6:26 AM, Justin Piszcz <jpiszcz@...idpixels.com>
> wrote:
>> From Brian's earlier e-mail:
>>
>>> > I filed this kernel bug:
>>> > https://bugzilla.redhat.com/show_bug.cgi?id=462425
>>
>>
>> On Mon, 22 Sep 2008, Justin Piszcz wrote:
>>
>>> I could not agree more.
>>>
>>> CC'ing the relevant mailing lists to see if someone out there has any
>>> idea
>>> what more we could do as this has been affecting you (more so than
>>> myself,
>>> but I would still like to get some sort of resolution as well, as it
>>> still
>>> happens to me too):
>>>
>>> Similar, but not the same issue:
>>>
>>> Sep 17 20:20:05 p34 kernel: [1422169.440538] ata5.00: exception Emask
>>> 0x0
>>> SAct 0x0 SErr 0x0 action 0x6 frozen
>>> Sep 17 20:20:05 p34 kernel: [1422169.440549] ata5.00: cmd
>>> b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
>>> Sep 17 20:20:05 p34 kernel: [1422169.440551] res
>>> 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
>>> Sep 17 20:20:05 p34 kernel: [1422169.440556] ata5.00: status: { DRDY }
>>> Sep 17 20:20:05 p34 kernel: [1422169.440561] ata5: hard resetting link
>>> Sep 17 20:20:06 p34 kernel: [1422169.744980] ata5: SATA link up 3.0 Gbps
>>> (SStatus 123 SControl 300)
>>> Sep 17 20:20:06 p34 kernel: [1422169.770448] ata5.00: configured for
>>> UDMA/133
>>> Sep 17 20:20:06 p34 kernel: [1422169.770461] ata5: EH complete
>>>
>>> (2.6.23.3) above
>>>
>>> On Mon, 22 Sep 2008, Brian Rademacher wrote:
>>>
>>>> Works fine...Also works under heavy load with only 4 drives. I could
>>>> only get it to fail by doing a raid resync with 4 drives, except for
>>>> the
>>>> newer kernel, which dies pretty easily..
>>>>
>>>> What is really frustrating about it is that short of the bugzilla bug I
>>>> submitted, I don't know who would be willing to listen...A lot of the
>>>> google
>>>> hits when searching "action 0x2 frozen" are related to a particular
>>>> CDROM
>>>> drive, or general hardware failure. I really don't think that is the
>>>> case
>>>> here, but I bet most of the kernel people think the same thing, so they
>>>> have
>>>> no reason to care...
>>>>
>>>>
>>>> Sent: Monday, September 22, 2008 7:04 AM
>>>> Subject: Re: Hardware RAID
>>>>
>>>>
>>>>> What about if you just 'stress' one drive?
>>>>>
>>>>> 1. dd if=/dev/sda of=/dev/null bs=1M &
>>>>> Does it do it?
>>>>> 2. Same thing for sdb?
>>>>>
>>>>> Justin.
>>>>>
>>>>> On Mon, 22 Sep 2008, Brian Rademacher wrote:
>>>>>
>>>>>> I killed smartd for testing. Other than that, it seems entirely load
>>>>>> based. Anything disk intensive (backups, raid resync, a bunch of spam
>>>>>> comes
>>>>>> in at once, etc.) makes it fail...
>>>>>>
>>>>>> Sent: Monday, September 22, 2008 6:29 AM
>>>>>> Subject: Re: Hardware RAID
>>>>>>
>>>>>>
>>>>>>> While the error happens for me as well it does NOT happen with that
>>>>>>> much consistency, if I were you, I would start testing different
>>>>>>> kernels and
>>>>>>> run it in single user mode (or as close to it as you can) to see if
>>>>>>> you can
>>>>>>> narrow down what is causing it, also boot knoppix and see if it
>>>>>>> occurs-- ?
>>>>>>>
>>>>>>> Justin.
>>>>>>>
>>>>>>> On Mon, 22 Sep 2008, Brian Rademacher wrote:
>>>>>>>
>>>>>>>> Doesn't look like a very powerful RAID card, so I may pass on it.
>>>>>>>> I
>>>>>>>> don't think it will have the BW to run as fast as the software RAID
>>>>>>>> currently does since it's only a 64bit/66mhz PCI slot...
>>>>>>>>
>>>>>>>> I hate to do the hardware RAID thing, but this error is killing me:
>>>>>>>> Sep 21 12:05:19 radfiles kernel: ata1.00: exception Emask 0x0 SAct
>>>>>>>> 0x1 SErr 0x0 action 0x2 frozen
>>>>>>>> Sep 21 12:32:12 radfiles kernel: ata1.00: exception Emask 0x0 SAct
>>>>>>>> 0x1 SErr 0x0 action 0x2 frozen
>>>>>>>> Sep 21 12:41:34 radfiles kernel: ata1.00: exception Emask 0x0 SAct
>>>>>>>> 0x1 SErr 0x0 action 0x2 frozen
>>>>>>>> Sep 21 12:58:22 radfiles kernel: ata1.00: exception Emask 0x0 SAct
>>>>>>>> 0x1 SErr 0x0 action 0x2 frozen
>>>>>>>> Sep 21 13:11:04 radfiles kernel: ata1.00: exception Emask 0x0 SAct
>>>>>>>> 0x1 SErr 0x0 action 0x2 frozen
>>>>>>>> Sep 21 13:23:55 radfiles kernel: ata1.00: exception Emask 0x0 SAct
>>>>>>>> 0x1 SErr 0x0 action 0x2 frozen
>>>>>>>> Sep 21 13:54:23 radfiles kernel: ata1.00: exception Emask 0x0 SAct
>>>>>>>> 0x1 SErr 0x0 action 0x2 frozen
>>>>>>>> Sep 21 15:15:04 radfiles kernel: ata1.00: exception Emask 0x0 SAct
>>>>>>>> 0x1 SErr 0x0 action 0x2 frozen
>>>>>>>> Sep 21 15:44:06 radfiles kernel: ata1.00: exception Emask 0x0 SAct
>>>>>>>> 0x1 SErr 0x0 action 0x2 frozen
>>>>>>>> Sep 21 21:15:12 radfiles kernel: ata1.00: exception Emask 0x0 SAct
>>>>>>>> 0x1 SErr 0x0 action 0x2 frozen
>>>>>>>>
>>>>>>>> And at this point, I can either regress to a 4 drive RAID and don't
>>>>>>>> update the kernel, or move forward with hardware...
>>>>>>>>
>>>>>>>> I don't see a fix coming any time soon, but maybe I'll try one of
>>>>>>>> the
>>>>>>>> latest F10 kernels just to see if anything has changed...
>>>>>>>>
>>>>>>>>
>>>>>>>> ----- Original Message ----- From: "Justin Piszcz" Sent: Monday,
>>>>>>>> September 22, 2008 2:05 AM
>>>>>>>> Subject: Re: Hardware RAID
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, 21 Sep 2008, Brian Rademacher wrote:
>>>>>>>>>
>>>>>>>>>> The RAID gods must have been thinking about me. My MB has one of
>>>>>>>>>> these funny slots and supports ZCR, so for the price I'm going to
>>>>>>>>>> jump ship.
>>>>>>>>>> I would guess (and hope) this solves the problem, especially
>>>>>>>>>> since I'll have
>>>>>>>>>> to reconstruct the entire array...
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://cgi.ebay.com/2113600-R-Adaptec-Serial-ATA-RAID-2025SA-Storage_W0QQitemZ250295938636QQihZ015QQcategoryZ167QQssPageNameZWDVWQQrdZ1QQcmdZViewItem
>>>>>>>>>
>>>>>>>>> Hm cool-- let me know how it goes.
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
>> the body of a message to majordomo@...r.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists