lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4FC4D6E2.9060501@redhat.com>
Date:	Tue, 29 May 2012 11:02:10 -0300
From:	Mauro Carvalho Chehab <mchehab@...hat.com>
To:	Borislav Petkov <bp@...64.org>
CC:	Linux Edac Mailing List <linux-edac@...r.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Aristeu Rozanski <arozansk@...hat.com>,
	Doug Thompson <norsk5@...oo.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Ingo Molnar <mingo@...hat.com>
Subject: Re: [PATCH] RAS: Add a tracepoint for reporting memory controller
 events

Em 29-05-2012 08:58, Borislav Petkov escreveu:
> On Thu, May 24, 2012 at 03:00:53PM -0300, Mauro Carvalho Chehab wrote:

<ironic comments skipped>
 
>> the address and address mask is needed, as most memory controllers can't point
>> to a single address, because the register that stores the address doesn't have
>> enough bits to store the full content of the instruction pointer register, or because
>> of some other internal device issues.
>>
>> So, two different "addresses" could atually point to the same group of transistors
>> inside a DIMM.
>>
>> Also, higher values of grains may affect the error statistics. For example, i3200_edac
>> driver has a grain that can be 64 MB, while other devices have a grain of 1.
> 
> I think you mean
> 
> #define I3200_TOM_SHIFT         26      /* 64MiB grain */

> 
> which is the Top-Of-Memory shift value. How is that grain in the sense of error
> granularity I can't fathom.
> 

It seems you were unable to read the comments at the function that fills dimm->grain:

	/*
	 * The dram rank boundary (DRB) reg values are boundary addresses
	 * for each DRAM rank with a granularity of 64MB.  DRB regs are
	 * cumulative; the last one will contain the total memory
	 * contained in all ranks.
	 */
	for (i = 0; i < mci->nr_csrows; i++) {
		unsigned long nr_pages;
		struct csrow_info *csrow = &mci->csrows[i];

		nr_pages = drb_to_nr_pages(drbs, stacked,
			i / I3200_RANKS_PER_CHANNEL,
			i % I3200_RANKS_PER_CHANNEL);

		if (nr_pages == 0)
			continue;

		for (j = 0; j < nr_channels; j++) {
			struct dimm_info *dimm = csrow->channels[j].dimm;

			dimm->nr_pages = nr_pages / nr_channels;
			dimm->grain = nr_pages << PAGE_SHIFT;
	...


Assuming that errors are given by a Gausian distribution, the PDF parameters (mean, standard
derivation) when grain is equal to 1 is completely different than when grain is 64 MB.

That means that any correlation function used by an stochastic process analysis
will need to take the grain into account, in order to detect if a series of errors
are due to a random noise, or if they're due to a physical problem at the device.

> Oh, and by the way, this define is unused and can be removed.

Feel free to submit a patch for it.

Regards,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ