lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 5 Mar 2009 18:01:36 +0100
From:	Andreas Herrmann <andreas.herrmann3@....com>
To:	Ingo Molnar <mingo@...e.hu>
CC:	Jaswinder Singh Rajput <jaswinder@...nel.org>,
	"H. Peter Anvin" <hpa@...or.com>, x86 maintainers <x86@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [git-pull -tip] x86: msr architecture debug code

On Thu, Mar 05, 2009 at 03:08:09PM +0100, Ingo Molnar wrote:
> * Andreas Herrmann <andreas.herrmann3@....com> wrote:
> > Having this stuff in the kernel unnecessarily bloats up kernel code.
> 
> it should be a default-off Kconfig option and it is in debugfs 
> so there's no real bloat issue here.

I attached parts of an autogenerated file which contains MSR
definitions for AMD family 10h in some condensed format. I stripped off
some lines -- the file had 487 lines and is about 30k.
You really like to have similar stuff for all x86 CPUs in-kernel?

> > What the kernel needs to provide is a reliable interface to 
> > access MSRs -- to pass the data to userspace. This interface 
> > is already there.
> > 
> > IMHO all kind of parsing and grouping of that data belongs in 
> > user space.
> > 
> > One exception are MSRs that need to be checked early during 
> > boot (e.g. MTRRs). For debugging purposes you might want to 
> > dump certain MSRs early. But then you will use printk and not 
> > debugfs.
> 
> Well it's really nice to know the _kernel's_ enumeration of MSRs 
> and its knowledge about the structure of those MSRs.
> 
> Sure, we can and do export the flat MSR space to user-space, but 
> the kernel also enumerates them internally, in various places. 
> The debugfs interface shows them in one way - and as such also 
> acts as a central force to keep these things tidy.
> 
> a VFS namespace is also pretty educative. You can see which MSRs 
> matter to the lapic for example, you can see their symbolic 
> names, their current state, etc. etc.

> > > Maybe a symlink pointing it back to the topic directory 
> > > would be useful as well. For example:
> > > 
> > >  /debug/x86/cpu/msr/raw/0x372/topic_dir -> /debug/x86/cpu/msr/pmu/pmc_0/
> > > 
> > > Other "topic directories" are possible too: a 
> > > /debug/x86/cpu/msr/apic/ layout would be very useful and 
> > > informative as well, and so are some of the other MSRs we 
> > > tweak during bootup.
> > 
> > All nice suggestions but why in-kernel?
> > 
> > Just hack some script to do this. This is much more 
> > maintainable. You don't need a kernel update to add support 
> > for new CPUs or to fix bugs in this code itself -- you just 
> > have to tweak your script.
> 
> the kernel tends to know a lot about these MSRs already so we 
> just provide that information in a more structured form as well.
>
> Such more structured form, beyond the debugging and 
> education/development advantages, also acts as a counter-force 
> back to the MSR enumeration code of the kernel and makes them 
> more structured. It will no doubt also extend the kernel's 
> knowledge of MSRs - read-only MSRs we dont normally read.

If we don't read them
we don't need them --
in kernel code.

The knowledge of MSRs is usually required by certain code, drivers or
subsystems. I think, we should only add MSR information if it is
needed for real kernel functionality. Some examples are

- MCA MSRs for mce
- Pstate and FIDVID MSRs for powernow-k8
- MTRRs for cpu/mtrr code

We don't have interfaces for PCI devices to show all their config
space register values in decoded form. The kernel provides the
interface to retrieve that information from userspace and usually you
call lspci to decode some standard information and to dump all the
rest.

For MSRs we have an interface, too. There is a lack of a standard
tool to do the decoding. (As a start you can use lsmsr.)

> There's also a few other things like the IRR readout in the APIC 
> code or the perfcounters status dump can also be done cleanly 
> via /debug/x86/cpu/msr/.
> 
> Eventually i'd like /debug/x86/ to become a full CPU state dump: 
> the kernel pagetable dumping code could go there, we could show 
> control registers, we could show the GDT and IDT settings and 
> contents, etc. etc.

Yes, we could do a lot in the kernel. But should we?

I second that dumping and decoding MSRs (and also CPU config space
registers for AMD CPUs) is sometimes needed for debugging. But doing
all of this in-kernel -- I think, that's not cool.


Regards,
Andreas

-- 
/*
 * Licensed under the terms of the GNU GENERAL PUBLIC LICENSE version 2.
 * See file COPYING for details.
 */

#ifndef fam10h_h
#define fam10h_h

#include "../msr.h"

_RANGE(fam10h_LSMCAaddr,48,16,0);
_NAMES(fam10h_LSMCAaddr,"ADDR",0);
_RANGE(fam10h_LSMCAstatus,16,4,25,1,1,8,2,1,1,1,1,1,1,1,0);
_NAMES(fam10h_LSMCAstatus,"ErrorCode","ErrorCodeExt",0,"UECC","CECC","SYND",0,"PCC","ADDRV","MISCV","EN","UC","OVER","VAL");
_RANGE(fam10h_TSC,64,0);
_NAMES(fam10h_TSC,"TSC");
_RANGE(fam10h_APIC_BASE,8,1,2,1,36,16,0);
_NAMES(fam10h_APIC_BASE,0,"BSC",0,"ApicEn","ApicBar",0);
_RANGE(fam10h_EBL_CR_POWERON,16,2,46,0);
_NAMES(fam10h_EBL_CR_POWERON,0,"ClusterID",0);
_RANGE(fam10h_PATCH_LEVEL,32,32,0);
_NAMES(fam10h_PATCH_LEVEL,"PATCH_LEVEL",0);
_RANGE(fam10h_MTRRcap,8,1,1,1,53,0);
_NAMES(fam10h_MTRRcap,"MtrrCapVCnt","MtrrCapFix",0,"MtrrCapWc",0);
_RANGE(fam10h_SYSENTER_CS,16,48,0);
_NAMES(fam10h_SYSENTER_CS,"SYSENTER_CS",0);
_RANGE(fam10h_SYSENTER_ESP,32,32,0);
_NAMES(fam10h_SYSENTER_ESP,"SYSENTER_ESP",0);
_RANGE(fam10h_SYSENTER_EIP,32,32,0);
_NAMES(fam10h_SYSENTER_EIP,"SYSENTER_EIP",0);
_RANGE(fam10h_MCG_CAP,8,1,55,0);
_NAMES(fam10h_MCG_CAP,"Count","MCG_CTL_P",0);
_RANGE(fam10h_MCG_STAT,1,1,1,61,0);
_NAMES(fam10h_MCG_STAT,"RIPV","EIPV","MCIP",0);
_RANGE(fam10h_MCG_CTL,1,1,1,1,1,1,58,0);
_NAMES(fam10h_MCG_CTL,"DCE","ICE","BUE","LSE","NBE","FRE",0);
_RANGE(fam10h_DBG_CTL_MSR,1,1,1,1,1,1,58,0);
_NAMES(fam10h_DBG_CTL_MSR,"LBR","BTF","PB0","PB1","PB2","PB3",0);
_RANGE(fam10h_BR_FROM,64,0);
_NAMES(fam10h_BR_FROM,"LastBranchFromIP");

    ...

_RANGE(fam10h_MC5_CTL,1,63,0);
_NAMES(fam10h_MC5_CTL,"CPUWDT",0);
_RANGE(fam10h_MC5_STATUS,16,4,4,8,8,1,4,1,1,8,2,1,1,1,1,1,1,1,0);
_NAMES(fam10h_MC5_STATUS,"ErrorCode","ErrorCodeExt",0,"Syndrome",0,"Scrub",0,"UECC","CECC","Syndrome",0,"PCC","AddrV","MiscV","En","UC","OVER","VAL");
_RANGE(fam10h_MC5_ADDR,48,16,0);
_NAMES(fam10h_MC5_ADDR,"ADDR",0);
_RANGE(fam10h_MC5_MISC,12,52,0);
_NAMES(fam10h_MC5_MISC,"State",0);
_RANGE(fam10h_EFER,1,7,1,1,1,1,1,1,1,49,0);
_NAMES(fam10h_EFER,"SYSCALL",0,"LME",0,"LMA","NXE","SVME","LMSLE","FFXSE",0);
_RANGE(fam10h_STAR,32,16,16,0);
_NAMES(fam10h_STAR,"Target","SysCallSel","SysRetSel");
_RANGE(fam10h_STAR64,64,0);
_NAMES(fam10h_STAR64,"LSTAR");
_RANGE(fam10h_STARCOMPAT,64,0);
_NAMES(fam10h_STARCOMPAT,"CSTAR");
_RANGE(fam10h_SYSCALL_FLAG_MASK,32,32,0);
_NAMES(fam10h_SYSCALL_FLAG_MASK,"MASK",0);
_RANGE(fam10h_FS_BASE,64,0);
_NAMES(fam10h_FS_BASE,"FS_BASE");
_RANGE(fam10h_GS_BASE,64,0);
_NAMES(fam10h_GS_BASE,"GS_BASE");
_RANGE(fam10h_KernelGSbase,64,0);
_NAMES(fam10h_KernelGSbase,"KernelGSBase");
_RANGE(fam10h_TSC_AUX,32,32,0);
_NAMES(fam10h_TSC_AUX,"TscAux",0);
_RANGE(fam10h_MC4_MISC1,24,8,12,4,1,2,1,4,5,1,1,1,0);
_NAMES(fam10h_MC4_MISC1,0,"BlkPtr","ErrCnt",0,"Ovrflw","IntType","CntEn","LvtOffset",0,"Locked","CntP","Valid");
_RANGE(fam10h_MC4_MISC2,24,8,12,4,1,2,1,4,5,1,1,1,0);
_NAMES(fam10h_MC4_MISC2,0,"BlkPtr","ErrCnt",0,"Ovrflw","IntType","CntEn","LvtOffset",0,"Locked","CntP","Valid");
_RANGE(fam10h_MC4_MISC3,24,8,32,0);
_NAMES(fam10h_MC4_MISC3,0,"BlkPtr",0);
_RANGE(fam10h_PERF_CTL0,8,8,1,1,1,1,1,1,1,1,8,4,4,1,1,22,0);
_NAMES(fam10h_PERF_CTL0,"EventSelect","UnitMask","User","OS","Edge",0,"Int",0,"En","Inv","CntMask","EventSelect",0,"GuestOnly","HostOnly",0);
_RANGE(fam10h_PERF_CTL1,8,8,1,1,1,1,1,1,1,1,8,4,4,1,1,22,0);
_NAMES(fam10h_PERF_CTL1,"EventSelect","UnitMask","User","OS","Edge",0,"Int",0,"En","Inv","CntMask","EventSelect",0,"GuestOnly","HostOnly",0);
_RANGE(fam10h_PERF_CTL2,8,8,1,1,1,1,1,1,1,1,8,4,4,1,1,22,0);
_NAMES(fam10h_PERF_CTL2,"EventSelect","UnitMask","User","OS","Edge",0,"Int",0,"En","Inv","CntMask","EventSelect",0,"GuestOnly","HostOnly",0);
_RANGE(fam10h_PERF_CTL3,8,8,1,1,1,1,1,1,1,1,8,4,4,1,1,22,0);
_NAMES(fam10h_PERF_CTL3,"EventSelect","UnitMask","User","OS","Edge",0,"Int",0,"En","Inv","CntMask","EventSelect",0,"GuestOnly","HostOnly",0);
_RANGE(fam10h_PERF_CTR0,48,16,0);
_NAMES(fam10h_PERF_CTR0,"CTR",0);

    ...

_RANGE(fam10h_IbsFetchCtl,16,16,16,1,1,1,1,1,2,1,1,1,6,0);
_NAMES(fam10h_IbsFetchCtl,"IbsFetchMaxCnt","IbsFetchCnt","IbsFetchLat","IbsFetchEn","IbsFetchVal","IbsFetchComp","IbsIcMiss","IbsPhyAddrValid","IbsL1TlbPgSz","IbsL1TlbMiss","IbsL2TlbMiss","IbsRandEn",0);
_RANGE(fam10h_IbsFetchLinAd,64,0);
_NAMES(fam10h_IbsFetchLinAd,"IbsFetchLinAd");
_RANGE(fam10h_IbsFetchPhysAd,64,0);
_NAMES(fam10h_IbsFetchPhysAd,"IbsFetchPhysAd");
_RANGE(fam10h_IbsOpCtl,16,1,1,1,45,0);
_NAMES(fam10h_IbsOpCtl,"IbsOpMaxCnt",0,"IbsOpEn","IbsOpVal",0);
_RANGE(fam10h_IbsOpRip,64,0);
_NAMES(fam10h_IbsOpRip,"IbsOpRip");
_RANGE(fam10h_IbsOpData,16,16,1,1,1,1,1,1,26,0);
_NAMES(fam10h_IbsOpData,"IbsCompToRetCtr","IbsTagToRetCtr","IbsOpBrnResync","IbsOpMispReturn","IbsOpReturn","IbsOpBrnTaken","IbsOpBrnMisp","IbsOpBrnRet",0);
_RANGE(fam10h_IbsOpData2,3,1,1,1,58,0);
_NAMES(fam10h_IbsOpData2,"NbIbsReqSrc",0,"NbIbsReqDstProc","NbIbsReqCacheHitSt",0);
_RANGE(fam10h_IbsOpData3,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,13,16,16,0);
_NAMES(fam10h_IbsOpData3,"IbsLdOp","IbsStOp","IbsDcL1tlbMiss","IbsDcL2tlbMiss","IbsDcL1tlbHit2M","IbsDcL1tlbHit1G","IbsDcL2tlbHit2M","IbsDcMiss","IbsDcMisAcc","IbsDcLdBnkCon","IbsDcStBnkCon","IbsDcStToLdFwd","IbsDcStToLdCan","IbsDcUcMemAcc","IbsDcWcMemAcc","IbsDcLockedOp","IbsDcMabHit","IbsDcLinAddrValid","IbsDcPhyAddrValid",0,"IbsDcMissLat",0);
_RANGE(fam10h_IbsDcLinAd,64,0);
_NAMES(fam10h_IbsDcLinAd,"IbsDcLinAd");
_RANGE(fam10h_IbsDcPhysAd,64,0);
_NAMES(fam10h_IbsDcPhysAd,"IbsDcPhysAd");
_RANGE(fam10h_IbsControl,4,4,1,55,0);
_NAMES(fam10h_IbsControl,"LvtOffset",0,"LvtOffsetVal",0);

struct reg_spec fam10h_spec [] = {
	_SPEC(0x0000, LSMCAaddr, "load-store MCA address", fam10h_),
	_SPEC(0x0001, LSMCAstatus, "load-store MCE status", fam10h_),
	_SPEC(0x0010, TSC, "time-stamp counter", fam10h_),
	_SPEC(0x001b, APIC_BASE, "APIC base address", fam10h_),
	_SPEC(0x002a, EBL_CR_POWERON, "cluster ID", fam10h_),
	_SPEC(0x008b, PATCH_LEVEL, "microcode patch level", fam10h_),
	_SPEC(0x00fe, MTRRcap, "MTRR capabilities", fam10h_),
	_SPEC(0x0174, SYSENTER_CS, "SYSENTER/SYSEXIT code segment selector", fam10h_),
	_SPEC(0x0175, SYSENTER_ESP, "SYSENTER/SYSEXIT stack pointer", fam10h_),
	_SPEC(0x0176, SYSENTER_EIP, "SYSENTER/SYSEXIT instruction pointer", fam10h_),
	_SPEC(0x0179, MCG_CAP, "global MC capabilities", fam10h_),
	_SPEC(0x017a, MCG_STAT, "global MC status", fam10h_),
	_SPEC(0x017b, MCG_CTL, "global MC control", fam10h_),
	_SPEC(0x01d9, DBG_CTL_MSR, "debug control", fam10h_),
	_SPEC(0x01db, BR_FROM, "last branch from IP", fam10h_),
	_SPEC(0x01dc, BR_TO, "last branch to IP", fam10h_),
	_SPEC(0x01dd, LastExceptionFromIP, "last exception from IP", fam10h_),
	_SPEC(0x01de, LastExceptionToIP, "last exception to IP", fam10h_),
	_SPEC(0x0200, MTRRphysBase0, "base of variable-size MTRR (0)", fam10h_),
	_SPEC(0x0201, MTRRphysMask0, "mask of variable-size MTRR (0)", fam10h_),

    ...

	_SPEC(0xc0011023, BU_CFG, "bus unit configuration", fam10h_),
	_SPEC(0xc001102A, BU_CFG2, "bus unit configuration 2", fam10h_),
	_SPEC(0xc0011030, IbsFetchCtl, "IBS fetch control", fam10h_),
	_SPEC(0xc0011031, IbsFetchLinAd, "IBS fetch linear address", fam10h_),
	_SPEC(0xc0011032, IbsFetchPhysAd, "IBS fetch physical address", fam10h_),
	_SPEC(0xc0011033, IbsOpCtl, "IBS execution control", fam10h_),
	_SPEC(0xc0011034, IbsOpRip, "IBS Op logical address", fam10h_),
	_SPEC(0xc0011035, IbsOpData, "IBS Op data", fam10h_),
	_SPEC(0xc0011036, IbsOpData2, "IBS Op data 2", fam10h_),
	_SPEC(0xc0011037, IbsOpData3, "IBS Op data 3", fam10h_),
	_SPEC(0xc0011038, IbsDcLinAd, "IBS DC linear address", fam10h_),
	_SPEC(0xc0011039, IbsDcPhysAd, "IBS DC physical address", fam10h_),
	_SPEC(0xc001103a, IbsControl, "IBS control", fam10h_),
	{0, NULL, NULL, NULL, NULL},
};

#endif /* fam10h_h */


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ