lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4F96E1EB.1030407@redhat.com>
Date:	Tue, 24 Apr 2012 14:24:59 -0300
From:	Mauro Carvalho Chehab <mchehab@...hat.com>
To:	Borislav Petkov <bp@...64.org>
CC:	Tony Luck <tony.luck@...el.com>,
	Linux Edac Mailing List <linux-edac@...r.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Doug Thompson <norsk5@...oo.com>
Subject: Re: [EDAC PATCH v13 6/7] edac.h: Prepare to handle with generic layers

Em 24-04-2012 13:27, Borislav Petkov escreveu:
> On Tue, Apr 24, 2012 at 11:24:03AM -0300, Mauro Carvalho Chehab wrote:
>> Yes (well, except that Nehalem has also a concept of "virtual channel", so
>> calling it "virtual" can mislead into a different view).
> 
> No, it cannot. It is a very simple question: Am I looking at virtual
> slots/channels or not, when I'm looking at edac-ctl output?

It is showing physical slots/channels at edac-ctl output.

> [..]
> 
>>> I hope you can understand my confusion now:
>>>
>>> On the one hand, there are the physical slots where the DIMMs are
>>> sticked into.
>>>
>>> OTOH, there are the slots==ranks which the memory controllers use to
>>> talk to the DIMMs.
>>
>> This only applies to amd64 and other csrows-based memory controllers.
>>
>> A memory controller like the one at Nehalem abstracts csrows (I suspect
>> that they have internally something functionally similar to a FB-DIMM
>> AMB internally). They do memory interleaving between the memory channels
>> in order to produce a cachesize bigger than 64 bits, but they don't
> 
> You mean cacheline here.

Yes. Sorry for the typo.

>> actually care about how many ranks are there on each DIMM.
> 
> This cannot be right - you need the chip select to talk to a rank.
> This is basic DDR functionality.

Yes, but this seems to be hidden on some lower level layer on their hardware.
The rank information is only an information inside their per-DIMM registers.

> I can imagine that they're doing some tricks like channel/chip
> select/memory controller interleaving.

They can do all several different types of interleaving, using from 1
(no interleaving) to 4 channels. The interleave is done by address range,
not by csrow.

This is a dump of what sb_edac reads from Sandy Bridge EP registers:

[52803.640136] EDAC DEBUG: get_dimm_config: mc#1: Node ID: 1, source ID: 1
[52803.640141] EDAC DEBUG: get_dimm_config: Memory mirror is disabled
[52803.640154] EDAC DEBUG: get_dimm_config: Lockstep is disabled
[52803.640156] EDAC DEBUG: get_dimm_config: address map is on open page mode
[52803.640157] EDAC DEBUG: get_dimm_config: Memory is unregistered
[52803.640159] EDAC DEBUG: get_dimm_config: Channel #0  MTR0 = 500c
[52803.640162] EDAC DEBUG: get_dimm_config: mc#1: channel 0, dimm 0, 4096 Mb (1048576 pages) bank: 8, rank: 2, row: 0x8000, col: 0x400
[52803.640165] EDAC DEBUG: get_dimm_config: Channel #0  MTR1 = 500c
[52803.640168] EDAC DEBUG: get_dimm_config: mc#1: channel 0, dimm 1, 4096 Mb (1048576 pages) bank: 8, rank: 2, row: 0x8000, col: 0x400
[52803.640171] EDAC DEBUG: get_dimm_config: Channel #0  MTR2 = 0
[52803.640174] EDAC DEBUG: get_dimm_config: Channel #1  MTR0 = 500c
[52803.640176] EDAC DEBUG: get_dimm_config: mc#1: channel 1, dimm 0, 4096 Mb (1048576 pages) bank: 8, rank: 2, row: 0x8000, col: 0x400
[52803.640180] EDAC DEBUG: get_dimm_config: Channel #1  MTR1 = 500c
[52803.640182] EDAC DEBUG: get_dimm_config: mc#1: channel 1, dimm 1, 4096 Mb (1048576 pages) bank: 8, rank: 2, row: 0x8000, col: 0x400
[52803.640185] EDAC DEBUG: get_dimm_config: Channel #1  MTR2 = 0
[52803.640188] EDAC DEBUG: get_dimm_config: Channel #2  MTR0 = 500c
[52803.640190] EDAC DEBUG: get_dimm_config: mc#1: channel 2, dimm 0, 4096 Mb (1048576 pages) bank: 8, rank: 2, row: 0x8000, col: 0x400
[52803.640193] EDAC DEBUG: get_dimm_config: Channel #2  MTR1 = 500c
[52803.640195] EDAC DEBUG: get_dimm_config: mc#1: channel 2, dimm 1, 4096 Mb (1048576 pages) bank: 8, rank: 2, row: 0x8000, col: 0x400
[52803.640199] EDAC DEBUG: get_dimm_config: Channel #2  MTR2 = 0
[52803.640201] EDAC DEBUG: get_dimm_config: Channel #3  MTR0 = 500c
[52803.640203] EDAC DEBUG: get_dimm_config: mc#1: channel 3, dimm 0, 4096 Mb (1048576 pages) bank: 8, rank: 2, row: 0x8000, col: 0x400
[52803.640218] EDAC DEBUG: get_dimm_config: Channel #3  MTR1 = 500c
[52803.640220] EDAC DEBUG: get_dimm_config: mc#1: channel 3, dimm 1, 4096 Mb (1048576 pages) bank: 8, rank: 2, row: 0x8000, col: 0x400
[52803.640223] EDAC DEBUG: get_dimm_config: Channel #3  MTR2 = 0
[52803.640226] EDAC DEBUG: get_memory_layout: TOLM: 3.136 GB (0x00000000c3ffffff)
[52803.640228] EDAC DEBUG: get_memory_layout: TOHM: 66.624 GB (0x0000001043ffffff)
[52803.640231] EDAC DEBUG: get_memory_layout: SAD#0 DRAM up to 33.792 GB (0x0000000840000000) Interleave: 8:6 reg=0x000083c3
[52803.640234] EDAC DEBUG: get_memory_layout: SAD#0, interleave #0: 0
[52803.640237] EDAC DEBUG: get_memory_layout: SAD#1 DRAM up to 66.560 GB (0x0000001040000000) Interleave: 8:6 reg=0x000103c3
[52803.640239] EDAC DEBUG: get_memory_layout: SAD#1, interleave #0: 1
[52803.640245] EDAC DEBUG: get_memory_layout: TAD#0: up to 66.560 GB (0x0000001040000000), socket interleave 0, memory interleave 3, TGT: 0, 1, 2, 3, reg=0x0040f3e4
[52803.640249] EDAC DEBUG: get_memory_layout: TAD CH#0, offset #0: 33.792 GB (0x0000000840000000), reg=0x00008400
[52803.640252] EDAC DEBUG: get_memory_layout: TAD CH#1, offset #0: 33.792 GB (0x0000000840000000), reg=0x00008400
[52803.640255] EDAC DEBUG: get_memory_layout: TAD CH#2, offset #0: 33.792 GB (0x0000000840000000), reg=0x00008400
[52803.640258] EDAC DEBUG: get_memory_layout: TAD CH#3, offset #0: 33.792 GB (0x0000000840000000), reg=0x00008400
[52803.640261] EDAC DEBUG: get_memory_layout: CH#0 RIR#0, limit: 8.191 GB (0x00000001fff00000), way: 4, reg=0xa000001e
[52803.640264] EDAC DEBUG: get_memory_layout: CH#0 RIR#0 INTL#0, offset 0.000 GB (0x0000000000000000), tgt: 0, reg=0x00000000
[52803.640278] EDAC DEBUG: get_memory_layout: CH#0 RIR#0 INTL#1, offset 0.000 GB (0x0000000000000000), tgt: 4, reg=0x00040000
[52803.640281] EDAC DEBUG: get_memory_layout: CH#0 RIR#0 INTL#2, offset 0.000 GB (0x0000000000000000), tgt: 1, reg=0x00010000
[52803.640283] EDAC DEBUG: get_memory_layout: CH#0 RIR#0 INTL#3, offset 0.000 GB (0x0000000000000000), tgt: 5, reg=0x00050000
[52803.640287] EDAC DEBUG: get_memory_layout: CH#1 RIR#0, limit: 8.191 GB (0x00000001fff00000), way: 4, reg=0xa000001e
[52803.640290] EDAC DEBUG: get_memory_layout: CH#1 RIR#0 INTL#0, offset 0.000 GB (0x0000000000000000), tgt: 0, reg=0x00000000
[52803.640293] EDAC DEBUG: get_memory_layout: CH#1 RIR#0 INTL#1, offset 0.000 GB (0x0000000000000000), tgt: 4, reg=0x00040000
[52803.640296] EDAC DEBUG: get_memory_layout: CH#1 RIR#0 INTL#2, offset 0.000 GB (0x0000000000000000), tgt: 1, reg=0x00010000
[52803.640299] EDAC DEBUG: get_memory_layout: CH#1 RIR#0 INTL#3, offset 0.000 GB (0x0000000000000000), tgt: 5, reg=0x00050000
[52803.640303] EDAC DEBUG: get_memory_layout: CH#2 RIR#0, limit: 8.191 GB (0x00000001fff00000), way: 4, reg=0xa000001e
[52803.640306] EDAC DEBUG: get_memory_layout: CH#2 RIR#0 INTL#0, offset 0.000 GB (0x0000000000000000), tgt: 0, reg=0x00000000
[52803.640309] EDAC DEBUG: get_memory_layout: CH#2 RIR#0 INTL#1, offset 0.000 GB (0x0000000000000000), tgt: 4, reg=0x00040000
[52803.640312] EDAC DEBUG: get_memory_layout: CH#2 RIR#0 INTL#2, offset 0.000 GB (0x0000000000000000), tgt: 1, reg=0x00010000
[52803.640315] EDAC DEBUG: get_memory_layout: CH#2 RIR#0 INTL#3, offset 0.000 GB (0x0000000000000000), tgt: 5, reg=0x00050000
[52803.640319] EDAC DEBUG: get_memory_layout: CH#3 RIR#0, limit: 8.191 GB (0x00000001fff00000), way: 4, reg=0xa000001e
[52803.640322] EDAC DEBUG: get_memory_layout: CH#3 RIR#0 INTL#0, offset 0.000 GB (0x0000000000000000), tgt: 0, reg=0x00000000
[52803.640324] EDAC DEBUG: get_memory_layout: CH#3 RIR#0 INTL#1, offset 0.000 GB (0x0000000000000000), tgt: 4, reg=0x00040000
[52803.640327] EDAC DEBUG: get_memory_layout: CH#3 RIR#0 INTL#2, offset 0.000 GB (0x0000000000000000), tgt: 1, reg=0x00010000
[52803.640330] EDAC DEBUG: get_memory_layout: CH#3 RIR#0 INTL#3, offset 0.000 GB (0x0000000000000000), tgt: 5, reg=0x00050000

In this case, all 4 channels are used for interleave:

[52803.640245] EDAC DEBUG: get_memory_layout: TAD#0: up to 66.560 GB (0x0000001040000000), socket interleave 0, memory interleave 3, TGT: 0, 1, 2, 3, reg=0x0040f3e4

It doesn't do DIMM socket interleave (socket interleave 0). It does channel interleave
among channels 0 to 3 (TGT: 0, 1, 2, 3). 

It also does an interleave at the physical memory address on bits 6 to 8:

[52803.640231] EDAC DEBUG: get_memory_layout: SAD#0 DRAM up to 33.792 GB (0x0000000840000000) Interleave: 8:6 reg=0x000083c3

This memory controller have thousands (literally) of different BIOS setups
that change how interleaves can happen on it. The above is the default
setup.

They're based on DIMM socket, MCU channel and physical address ranges.

> In the end of the day, it is smallest row that gives you 64 bits of
> data.

Yes, but the memory controller views memories per DIMM socket, and 

> @Tony: hey Tony, can you point us to an Intel document explaining how
> Sandy Bridge or NH or one of the new ones does the memory addressing wrt
> ranks, channels etc? Thanks.

For Nehalem, see i7core_edac comments that I added at the beginning of the
driver:

 * Based on the following public Intel datasheets:
 * Intel Core i7 Processor Extreme Edition and Intel Core i7 Processor
 * Datasheet, Volume 2:
 *	http://download.intel.com/design/processor/datashts/320835.pdf
 * Intel Xeon Processor 5500 Series Datasheet Volume 2
 *	http://www.intel.com/Assets/PDF/datasheet/321322.pdf
 * also available at:
 * 	http://www.arrownac.com/manufacturers/intel/s/nehalem/5500-datasheet-v2.pdf

> 
> [..]
> 
>> No. As far as I can tell, they can have 9 quad-ranked DIMMs (the machines
>> I've looked so far are all equipped with single rank memories, so I don't 
>> have a real scenario with 2R or 4R for Nehalem yet).
>>
>> At Sandy Bridge-EP (E. g. Intel E5 CPUs), we have one machine fully equipped
>> with dual rank memories. The number of ranks there is just a DIMM property.
>>
>> # ./edac-ctl --layout
>>        +-----------------------------------------------------------------------------------------------+
>>        |                      mc0                      |                      mc1                      |
>>        | channel0  | channel1  | channel2  | channel3  | channel0  | channel1  | channel2  | channel3  |
>> -------+-----------------------------------------------------------------------------------------------+
>> slot2: |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |
>> slot1: |  4096 MB  |  4096 MB  |  4096 MB  |  4096 MB  |  4096 MB  |  4096 MB  |  4096 MB  |  4096 MB  |
>> slot0: |  4096 MB  |  4096 MB  |  4096 MB  |  4096 MB  |  4096 MB  |  4096 MB  |  4096 MB  |  4096 MB  |
>> -------+-----------------------------------------------------------------------------------------------+
>>
>> (this machine doesn't have physical DIMM sockets for slot#2)
> 
> Ok, I can count 8 2R DIMMs here and each rank or slot in your
> nomenclature is 4G. slot#2 has to be something virtual since each rank
> occupies one slot, i.e. slot0 and slot1 on a channel.

No. This machine has 64 GB of RAM, and it was physically filled with 16 DIMMs, 
each with 4GB. Each of the above represents one DIMM (and not a rank).

Btw, the above logs are for this machine.

# free
             total       used       free     shared    buffers     cached
Mem:      65933268    1166384   64766884          0      60572     363712
-/+ buffers/cache:     742100   65191168
Swap:     68157436      18680   68138756

The DMI decode info also clearly states that:

# dmidecode|grep -e "Memory Device$" -e Size -e "Bank Locat" -e "Serial Number" |grep -v Range
...
Memory Device
	Size: 4096 MB
	Bank Locator: NODE 0 CHANNEL 0 DIMM 0
	Serial Number: 82766209  
Memory Device
	Size: 4096 MB
	Bank Locator: NODE 0 CHANNEL 0 DIMM 1
	Serial Number: 827661D3  
Memory Device
	Size: 4096 MB
	Bank Locator: NODE 0 CHANNEL 1 DIMM 0
	Serial Number: 82766197  
Memory Device
	Size: 4096 MB
	Bank Locator: NODE 0 CHANNEL 1 DIMM 1
	Serial Number: 82766204  
Memory Device
	Size: 4096 MB
	Bank Locator: NODE 0 CHANNEL 2 DIMM 0
	Serial Number: 827661D7  
Memory Device
	Size: 4096 MB
	Bank Locator: NODE 0 CHANNEL 2 DIMM 1
	Serial Number: 82766200  
Memory Device
	Size: 4096 MB
	Bank Locator: NODE 0 CHANNEL 3 DIMM 0
	Serial Number: 827661F9  
Memory Device
	Size: 4096 MB
	Bank Locator: NODE 0 CHANNEL 3 DIMM 1
	Serial Number: 827661B3  
Memory Device
	Size: 4096 MB
	Bank Locator: NODE 1 CHANNEL 0 DIMM 0
	Serial Number: 47473B79  
Memory Device
	Size: 4096 MB
	Bank Locator: NODE 1 CHANNEL 0 DIMM 1
	Serial Number: 440FF77F  
Memory Device
	Size: 4096 MB
	Bank Locator: NODE 1 CHANNEL 1 DIMM 0
	Serial Number: 47473B5A  
Memory Device
	Size: 4096 MB
	Bank Locator: NODE 1 CHANNEL 1 DIMM 1
	Serial Number: 47473B71  
Memory Device
	Size: 4096 MB
	Bank Locator: NODE 1 CHANNEL 2 DIMM 0
	Serial Number: 47473B62  
Memory Device
	Size: 4096 MB
	Bank Locator: NODE 1 CHANNEL 2 DIMM 1
	Serial Number: 440FF7FC  
Memory Device
	Size: 4096 MB
	Bank Locator: NODE 1 CHANNEL 3 DIMM 0
	Serial Number: 440FF7C1  
Memory Device
	Size: 4096 MB
	Bank Locator: NODE 1 CHANNEL 3 DIMM 1
	Serial Number: 440FF7F4  

As I said, for this memory controller, and for Nehalem, the memories are
mapped per DIMM socket (and not per rank).

Mauro.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ