lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 1 Nov 2017 20:03:20 +0100 From: Marc Gonzalez <marc_gonzalez@...madesigns.com> To: Alan Cox <gnomes@...rguk.ukuu.org.uk> Cc: Linus Torvalds <torvalds@...ux-foundation.org>, LKML <linux-kernel@...r.kernel.org>, Linux ARM <linux-arm-kernel@...ts.infradead.org>, Steven Rostedt <rostedt@...dmis.org>, Ingo Molnar <mingo@...nel.org>, Thomas Gleixner <tglx@...utronix.de>, Peter Zijlstra <peterz@...radead.org>, John Stultz <john.stultz@...aro.org>, Douglas Anderson <dianders@...omium.org>, Nicolas Pitre <nico@...aro.org>, Mark Rutland <mark.rutland@....com>, Will Deacon <will.deacon@....com>, Jonathan Austin <jonathan.austin@....com>, Arnd Bergmann <arnd@...db.de>, Kevin Hilman <khilman@...nel.org>, Russell King <linux@....linux.org.uk>, Michael Turquette <mturquette@...libre.com>, Stephen Boyd <sboyd@...eaurora.org>, Mason <slash.tmp@...e.fr> Subject: Re: [RFC] Improving udelay/ndelay on platforms where that is possible On 01/11/2017 18:53, Alan Cox wrote: > On Tue, 31 Oct 2017 17:15:34 +0100 > >> Therefore, users are accustomed to having delays be longer (within a reasonable margin). >> However, very few users would expect delays to be *shorter* than requested. > > If your udelay can be under by 10% then just bump the number by 10%. Except it's not *quite* that simple. Error has both an absolute and a relative component. So the actual value matters, and it's not always a constant. For example: http://elixir.free-electrons.com/linux/latest/source/drivers/mtd/nand/nand_base.c#L814 > However at that level most hardware isn't that predictable anyway because > the fabric between the CPU core and the device isn't some clunky > serialized link. Writes get delayed, they can bunch together, busses do > posting and queueing. Are you talking about the actual delay operation, or the pokes around it? > Then there is virtualisation 8) > >> A typical driver writer has some HW spec in front of them, which e.g. states: >> >> * poke register A >> * wait 1 microsecond for the dust to settle >> * poke register B > > Rarely because of posting. It's usually > > write > while(read() != READY); > write > > and even when you've got a legacy device with timeouts its > > write > read > delay > write > > and for sub 1ms delays I suspect the read and bus latency actually add a > randomization sufficient that it's not much of an optimization to worry > about an accurate ndelay(). I don't think "accurate" is the proper term. Over-delays are fine, under-delays are problematic. >> This "off-by-one" error is systematic over the entire range of allowed >> delay_us input (1 to 2000), so it is easy to fix, by adding 1 to the result. > > And that + 1 might be worth adding but really there isn't a lot of > modern hardware that has a bus that behaves like software folks imagine > and everything has percentage errors factored into published numbers. I guess I'm a software folk, but the designer of the system bus sits across my desk, and we do talk often. >> 3) Why does all this even matter? >> >> At boot, the NAND framework scans the NAND chips for bad blocks; >> this operation generates approximately 10^5 calls to ndelay(100); >> which cause a 100 ms delay, because ndelay is implemented as a >> call to the nearest udelay (rounded up). > > So why aren't you doing that on both NANDs in parallel and asynchronous > to other parts of boot ? If you start scanning at early boot time do you > need the bad block list before mounting / - or are you stuck with a > single threaded CPU and PIO ? There might be some low(ish) hanging fruit to improve the performance of the NAND framework, such as multi-page reads/writes. But the NAND controller on my SoC muxes access to the two NAND chips, so no parallel access, and this requires PIO. > For that matter given the bad blocks don't randomly change why not cache > them ? That's a good question, I'll ask the NAND framework maintainer. Store them where, by the way? On the NAND chip itself? >> My current NAND chips are tiny (2 x 512 MB) but with larger chips, >> the number of calls to ndelay would climb to 10^6 and the delay >> increase to 1 second, with is starting to be a problem. >> >> One solution is to implement ndelay, but ndelay is more prone to >> under-delays, and thus a prerequisite is fixing under-delays. > > For ndelay you probably have to make it platform specific or just use > udelay if not. We do have a few cases we wanted 400ns delays in the PC > world (ATA) but not many. By default, ndelay is implemented in terms of udelay. Regards.
Powered by blists - more mailing lists