lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4DDA46FD.904@redhat.com>
Date:	Mon, 23 May 2011 13:37:33 +0200
From:	Tomas Henzl <thenzl@...hat.com>
To:	Mike Miller <mike.miller@...com>
CC:	Valdis.Kletnieks@...edu, scameron@...rdog.cce.hp.com,
	Andrew Morton <akpm@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>,
	LKML-scsi <linux-scsi@...r.kernel.org>,
	Jens Axboe <axboe@...nel.dk>
Subject: Re: [PATCH 01/16] hpsa: do readl after writel in main i/o path to
 ensure commands don't get lost.

On 05/05/2011 08:35 PM, Mike Miller wrote:
> On Wed, May 04, 2011 at 01:54:22PM -0400, Valdis.Kletnieks@...edu wrote:
>   
>> On Wed, 04 May 2011 11:37:35 MDT, Matthew Wilcox said:
>>     
>>>> This probably needs a comment like
>>>> 	/* don't care - dummy read just to force write posting to chipset */
>>>> or similar.  I'm assuming it's just functioning as a barrier-type flush of some sort?
>>>>         
>>> It's a PCI write flush.  It's not clear to me why it's needed here,
>>> though.  The write will eventually get to the device; why we need to
>>> make the CPU wait around for it to actually get there doesn't make sense.
>>>       
>> Exactly why I think it needs a one-liner comment. :)
>>
>>     
> So we're not exactly sure why it's needed either. We've had reports of
> commands getting "lost" or "stuck" under some workloads. The extra readl
> works around the issue but certainly may have negative side effects.
>
> I'm not sure I understand how writel works.
>
> From linux-2.6/arch/x86/include/asm/io.h:
>
> #define build_mmio_write(name, size, type, reg, barrier) \
> static inline void name(type val, volatile void __iomem *addr) \
> { asm volatile("mov" size " %0,%1": :reg (val), \
> "m" (*(volatile type __force *)addr) barrier); }
>
> This implies (at least to me) that a barrier is part of writel. I don't know
> why a write operation needs a barrier but thats essentially what we've done
> by adding the extra readl. Can someone confirm or deny that a barrier is
> actually built into writel? Or used by writel? If so, does this indicate
> that barrier is broken?
>
> At this point we (the software guys) are pretty much at a loss as to how to
> continue debugging. We don't know what to trigger on for the PCIe analyzer.
> If we track outstanding commands then trigger on one that doesn't complete in
> some amount of time the problem could conceivably be far in the past and
> difficult to correlate to the data in the trace.
>   
I'd look at the firmware part, you could check what happens for example when
the firmware gets send a command it doesn't understand.
You could also change the communication with the fw by adding a count field, which can
be then checked for the !(next_value == previous_value + 1) and raise an event.
tomas


> If anyone has any thoughts, suggestions, or flames they would be greatly
> appreciated.
>
> -- mikem
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ