[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8922EB9A36049040806AD0C06C26CAAF451100CC@fmsmsx117.amr.corp.intel.com>
Date: Wed, 25 Mar 2015 18:09:11 +0000
From: "Brooks, Adam J" <adam.j.brooks@...el.com>
To: Christoph Hellwig <hch@....de>,
"linux-nvdimm@...1.01.org" <linux-nvdimm@...1.01.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"x86@...nel.org" <x86@...nel.org>
CC: "axboe@...nel.dk" <axboe@...nel.dk>,
Boaz Harrosh <boaz@...xistor.com>
Subject: RE: [Linux-nvdimm] another pmem variant
>The other two patches are a heavily rewritten version of the code that
>Intel gave to various storage vendors to discover the type 12 (and earlier
>type 6) nvdimms, which I massaged into a form that is hopefully suitable
>for mainline.
The problem is that the e820 or the UEFI Memory Map Table on their own are really bad ways to represent NVDIMMs. The memory table idea was originally developed 6 years ago prior to NVDIMMs existing. It was used to define traditional battery backed memory. With traditional battery backed memory either the whole region was going to be valid or the whole region was going to be gone. There was also no concept of arming. You simply have x hours of data retention based on your battery be y% charged. Fast forward a couple years, and we continued using the memory table method for something called Copy To Flash where the CPU would copy memory from the DIMMs to a SSD of some sort. Again this was a whole region or none of the region solution and because we were typically using SATA SSD there was no need to "arm" anything. Additionally the restore operation (and even the save operation if you were brave enough) could be done from the OS. Therefore there was no need for the BIOS to pass up any status regarding if the recovery was successful or not.
Fast forward again to the present day and NVDIMMs. We used the memory table model initially for NVDIMM because 1) the BIOS code was already in place 2) we had a non-upstreamed driver (something that predated pmem by several years called ADRBD). In a perfect world where there are no hardware failures e820+ADRBD work great for NVDIMMs. However in the real world where there are failures it has a number of short comings. Mainly there are the following issues with it:
1) The region may now be comprised for 2+ different NVDIMMs that have different statuses. A subset of NVDIMMs may have failed the restore. An NVDIMM may have been added since after the last save/restore of the existing NVDIMM
2) Just based on the e820 table, the OS has no one of knowing where the boundaries of the NVDIMMs are. It has no one of knowing if they are all interleaved together where a failure of single NVDIMM means the loss of the whole region, or if the NVDIMMs are non-interleaved and can be treated as separate memory regions to prevent the failure of one NVDIMM from causing data to be lost form all NVDIMM
2) Due to the requirement to restore the MRS/RC registers the NVDIMM restore must be done from the BIOS. Depending on the security settings of the platform the OS may not be able to directly interrogate the individual NVDIMMs to find their status. Even if the OS can get to the NVDIMM over SMBUS all information about the status of the last restore attempt may have been wiped if the BIOS was also configured to do the erase/arm operation
For those reasons (and more) simply using the current memory tables is not a good solution. A more detailed NVDIMM specific table is required to surface the status and configuration of the NVDIMMs. Unfortunately that table has been perpetually delayed, and a result people are trying to move forward with Type 12. I understand why this has been done, and for highly embedded storage appliances it is fine, because those users probably inherently know the configuration of the NVDIMMs. However for general purpose systems where the user has no way of knowing the exact configuration of the DIMMS, just using the e820 or UEFI Memory Map table is not sufficient.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists