lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 26 May 2011 14:13:40 +0200
From:	Tomas Henzl <thenzl@...hat.com>
To:	"Miller, Mike (OS Dev)" <Mike.Miller@...com>
CC:	"Valdis.Kletnieks@...edu" <Valdis.Kletnieks@...edu>,
	"scameron@...rdog.cce.hp.com" <scameron@...rdog.cce.hp.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>,
	LKML-scsi <linux-scsi@...r.kernel.org>,
	Jens Axboe <axboe@...nel.dk>
Subject: Re: [PATCH 01/16] hpsa: do readl after writel in main i/o path to
 ensure commands don't get lost.

On 05/25/2011 05:20 PM, Miller, Mike (OS Dev) wrote:
> Tomas wrote:
>
>   
>> -----Original Message-----
>> From: Tomas Henzl [mailto:thenzl@...hat.com]
>> Sent: Monday, May 23, 2011 6:38 AM
>> To: Miller, Mike (OS Dev)
>> Cc: Valdis.Kletnieks@...edu; scameron@...rdog.cce.hp.com; Andrew Morton;
>> LKML; LKML-scsi; Jens Axboe
>> Subject: Re: [PATCH 01/16] hpsa: do readl after writel in main i/o path
>> to ensure commands don't get lost.
>>
>> On 05/05/2011 08:35 PM, Mike Miller wrote:
>>     
>>> On Wed, May 04, 2011 at 01:54:22PM -0400, Valdis.Kletnieks@...edu
>>>       
>> wrote:
>>     
>>>       
>>>> On Wed, 04 May 2011 11:37:35 MDT, Matthew Wilcox said:
>>>>
>>>>         
>>>>>> This probably needs a comment like
>>>>>> 	/* don't care - dummy read just to force write posting to chipset
>>>>>>             
>> */
>>     
>>>>>> or similar.  I'm assuming it's just functioning as a barrier-type
>>>>>>             
>> flush of some sort?
>>     
>>>>>>             
>>>>> It's a PCI write flush.  It's not clear to me why it's needed here,
>>>>> though.  The write will eventually get to the device; why we need to
>>>>> make the CPU wait around for it to actually get there doesn't make
>>>>>           
>> sense.
>>     
>>>>>           
>>>> Exactly why I think it needs a one-liner comment. :)
>>>>
>>>>
>>>>         
>>> So we're not exactly sure why it's needed either. We've had reports of
>>> commands getting "lost" or "stuck" under some workloads. The extra
>>>       
>> readl
>>     
>>> works around the issue but certainly may have negative side effects.
>>>
>>> I'm not sure I understand how writel works.
>>>
>>> From linux-2.6/arch/x86/include/asm/io.h:
>>>
>>> #define build_mmio_write(name, size, type, reg, barrier) \
>>> static inline void name(type val, volatile void __iomem *addr) \
>>> { asm volatile("mov" size " %0,%1": :reg (val), \
>>> "m" (*(volatile type __force *)addr) barrier); }
>>>
>>> This implies (at least to me) that a barrier is part of writel. I
>>>       
>> don't know
>>     
>>> why a write operation needs a barrier but thats essentially what we've
>>>       
>> done
>>     
>>> by adding the extra readl. Can someone confirm or deny that a barrier
>>>       
>> is
>>     
>>> actually built into writel? Or used by writel? If so, does this
>>>       
>> indicate
>>     
>>> that barrier is broken?
>>>
>>> At this point we (the software guys) are pretty much at a loss as to
>>>       
>> how to
>>     
>>> continue debugging. We don't know what to trigger on for the PCIe
>>>       
>> analyzer.
>>     
>>> If we track outstanding commands then trigger on one that doesn't
>>>       
>> complete in
>>     
>>> some amount of time the problem could conceivably be far in the past
>>>       
>> and
>>     
>>> difficult to correlate to the data in the trace.
>>>
>>>       
>> I'd look at the firmware part, you could check what happens for example
>> when
>> the firmware gets send a command it doesn't understand.
>> You could also change the communication with the fw by adding a count
>> field, which can
>> be then checked for the !(next_value == previous_value + 1) and raise an
>> event.
>> tomas
>>     
> Tomas,
> We've tried something very similar to the counter idea in fw. It doesn't help because the controller thinks he's done with the request. We have a (pretty crude) counter in the driver but no timing mechanism. We could add a timer. But what's a suitable timeout value? Is 2 seconds too short, too long? Suggestions, please.
>   
I know that a counter isn't a ground-breaking idea, just wanted to show some interest :)
The command can be either eaten by the firmware or during the communication in or out from the device.
I'd would start by the communication, by adding some fields to the command to detect if a command in the row(s) isn't
missing - I know even that isn't easy. The same could be done independently done for the other direction.

tomash

> -- mikem
>
>
>   
>>
>>     
>>> If anyone has any thoughts, suggestions, or flames they would be
>>>       
>> greatly
>>     
>>> appreciated.
>>>
>>> -- mikem
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-scsi"
>>>       
>> in
>>     
>>> the body of a message to majordomo@...r.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>       
>   

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ