linux-kernel - Re: [PATCH] x86: add phys addr validity check for /dev/mem mmap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <91ecff35-3b4d-4782-ab8e-b56488aac5b7@email.android.com>
Date:	Sat, 27 Apr 2013 21:00:48 -0700
From:	"H. Peter Anvin" <hpa@...or.com>
To:	Will Huck <will.huckk@...il.com>,
	Frantisek Hrbata <fhrbata@...hat.com>
CC:	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	tglx@...utronix.de, mingo@...hat.com, x86@...nel.org,
	oleg@...hat.com, kamaleshb@...ibm.com, hechjie@...ibm.com
Subject: Re: [PATCH] x86: add phys addr validity check for /dev/mem mmap

Not reserved page, reserved bits in the page tables (which includes all bits beyond the maximum physical address.)

Will Huck <will.huckk@...il.com> wrote:

>On 04/28/2013 03:13 AM, Frantisek Hrbata wrote:
>> On Sat, Apr 27, 2013 at 03:00:11PM +0800, Will Huck wrote:
>>> On 04/26/2013 11:35 PM, Frantisek Hrbata wrote:
>>>> On Fri, Apr 26, 2013 at 01:21:28PM +0800, Will Huck wrote:
>>>>> Hi Peter,
>>>>> On 04/02/2013 08:28 PM, Frantisek Hrbata wrote:
>>>>>> When CR4.PAE is set, the 64b PTE's are
>used(ARCH_PHYS_ADDR_T_64BIT is set for
>>>>>> X86_64 || X86_PAE). According to [1] Chapter 4 Paging, some
>higher bits in 64b
>>>>>> PTE are reserved and have to be set to zero. For example, for
>IA-32e and 4KB
>>>>>> page [1] 4.5 IA-32e Paging: Table 4-19, bits 51-M(MAXPHYADDR) are
>reserved. So
>>>>>> for a CPU with e.g. 48bit phys addr width, bits 51-48 have to be
>zero. If one of
>>>>>> the reserved bits is set, [1] 4.7 Page-Fault Exceptions, the #PF
>is generated
>>>>>> with RSVD error code.
>>>>>>
>>>>>> <quote>
>>>>>> RSVD flag (bit 3).
>>>>>> This flag is 1 if there is no valid translation for the linear
>address because a
>>>>>> reserved bit was set in one of the paging-structure entries used
>to translate
>>>>>> that address. (Because reserved bits are not checked in a
>paging-structure entry
>>>>>> whose P flag is 0, bit 3 of the error code can be set only if bit
>0 is also
>>>>>> set.)
>>>>>> </quote>
>>>>>>
>>>>>> In mmap_mem() the first check is valid_mmap_phys_addr_range(),
>but it always
>>>>>> returns 1 on x86. So it's possible to use any pgoff we want and
>to set the PTE's
>>>>>> reserved bits in remap_pfn_range(). Meaning there is a
>possibility to use mmap
>>>>> In this case, remap_pfn_range() setup the map and reserved bits
>for
>>>>> mmio memory, so the mmio memory is already populated, why trigger
>>>>> #PF?
>>>> Hi,
>>>>
>>>> I think this is described in the quote above for the RSVD flag.
>>>>
>>>> remap_pfn_range() => page present => touch page => tlb miss =>
>>>> walk through paging structures => reserved bit set => #pf with rsvd
>flag
>>> Page present can also trigger #PF? why?
>> Yes, please see
>> Intel 64 and IA-32 Architectures Software Developer's Manual, Volume
>3A
>>
>> 4.7 PAGE-FAULT EXCEPTIONS
>> <quote>
>> · RSVD flag (bit 3).
>> This flag is 1 if there is no valid translation for the linear
>address because
>> a reserved bit was set in one of the paging-structure entries used to
>> translate that address. (Because reserved bits are not checked in a
>> paging-structure entry whose P flag is 0, bit 3 of the error code can
>be set
>> only if bit 0 is also set.) Bits reserved in the paging-structure
>entries are
>> reserved for future functionality. Software developers should be
>aware that
>> such bits may be used in the future and that a paging-structure entry
>that
>> causes a page-fault exception on one processor might not do so in the
>future.
>> </quote>
>>
>> I cannot tell you why. I guess this is more a question for some Intel
>guys.
>>
>> Anyway this patch is trying to fix the following problem and
>> the "Bad pagetable" oops.
>>
>>
>---------------------------------8<--------------------------------------
>> #include <stdio.h>
>> #include <unistd.h>
>> #include <sys/types.h>
>> #include <sys/stat.h>
>> #include <fcntl.h>
>> #include <err.h>
>> #include <stdlib.h>
>> #include <sys/mman.h>
>>
>> #define die(fmt, ...) err(1, fmt, ##__VA_ARGS__)
>>
>> /*
>>     1) Find some non system ram in case the CONFIG_STRICT_DEVMEM is
>defined
>>     $ cat /proc/iomem | grep -v "\(System RAM\|reserved\)"
>>
>>     2) Find physical address width
>>     $ cat /proc/cpuinfo | grep "address sizes"
>>
>>     PTE bits 51 - M are reserved, where M is physical address width
>found 2)
>>     Note: step 2) is actually not needed, we can always set just the
>51th bit
>>     (0x8000000000000)
>
>What's the meaning here? You trigger oops since the address is beyond 
>max address cpu supported or access to a reserved page? If the answer
>is 
>the latter, I'm think it's not right. For example, the kernel code/data
>
>section is reserved in memory, kernel access it will trigger oops? I 
>don't think so.
>
>>
>>     Set OFFSET macro to
>>
>>     (start of iomem range found in 1)) | (1 << 51)
>>
>>     for example
>>     0x000a0000 | 0x8000000000000 = 0x80000000a0000
>>
>>     where 0x000a0000 is start of PCI BUS on my laptop
>>
>>   */
>>
>> #define OFFSET 0x80000000a0000LL
>>
>> int main(int argc, char *argv[])
>> {
>> 	int fd;
>> 	long ps;
>> 	long pgoff;
>> 	char *map;
>> 	char c;
>>
>> 	ps = sysconf(_SC_PAGE_SIZE);
>> 	if (ps == -1)
>> 		die("cannot get page size");
>>
>> 	fd = open("/dev/mem", O_RDONLY);
>> 	if (fd == -1)
>> 		die("cannot open /dev/mem");
>>
>> 	printf("%Lx\n", pgoff);
>> 	pgoff = (OFFSET + (ps - 1)) & ~(ps - 1);
>> 	printf("%Lx\n", pgoff);
>>
>> 	map = mmap(NULL, ps, PROT_READ, MAP_SHARED, fd, pgoff);
>> 	if (map == MAP_FAILED)
>> 		die("cannot mmap");
>>
>> 	c = map[0];
>>
>> 	if (munmap(map, ps) == -1)
>> 		die("cannot munmap");
>>
>> 	if (close(fd) == -1)
>> 		die("cannot close");
>>
>> 	return 0;
>> }
>>
>---------------------------------8<--------------------------------------
>>
>> Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.814860] pfrsvd: Corrupted
>page table at address 7f34087c8000
>> Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.817356] PGD 12d0b3067 PUD
>12d544067 PMD 12e29d067 PTE 80080000000a0225
>> Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.820216] Bad pagetable:
>000d [#1] SMP
>> Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.822821] Modules linked in:
>fuse ebtable_nat xt_CHECKSUM bridge stp llc ipt_MASQUERADE
>nf_conntrack_netbios_ns nf_conntrack_broadcast ip6table_mangle
>ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4
>nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack
>nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables
>be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio
>libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core
>iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi rfcomm bnep arc4
>iwldvm mac80211 snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_intel
>snd_hda_codec uvcvideo snd_hwdep snd_seq snd_seq_device snd_pcm
>iTCO_wdt videobuf2_vmalloc videobuf2_memops videobuf2_core videodev
>btusb snd_page_alloc bluetooth snd_timer thinkpad_acpi iwlwifi media
>snd i2c_i801 cfg80211 iTCO_vendor_support intel_ips e1000e coretemp
>lpc_ich mfd_core soundcore rfkill mei microcode nfsd auth_rpcgss
>nfs_acl lockd sunrpc vhost_net tun macvtap macvlan kvm_intel kvm
>binfmt_misc uinput dm_crypt crc32c_intel i915 ghash_clmulni_intel
>firewire_ohci i2c_algo_bit drm_kms_helper firewire_core sdhci_pci
>crc_itu_t drm sdhci mmc_core i2c_core mxm_wmi video wmi
>> Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.845686] CPU 3
>> Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.845709] Pid: 8751, comm:
>pfrsvd Not tainted 3.8.1-201.fc18.x86_64 #1 LENOVO 4384AV1/4384AV1
>> Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.852876] RIP:
>0033:[<00000000004007db>]  [<00000000004007db>] 0x4007da
>> Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.856587] RSP:
>002b:00007ffff5c12620  EFLAGS: 00010213
>> Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.860296] RAX:
>00007f34087c8000 RBX: 0000000000000000 RCX: 00000030fd4eed6a
>> Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.864061] RDX:
>0000000000000001 RSI: 0000000000001000 RDI: 0000000000000000
>> Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.867878] RBP:
>00007ffff5c12660 R08: 0000000000000003 R09: 00080000000a0000
>> Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.871706] R10:
>0000000000000001 R11: 0000000000000206 R12: 00000000004005f0
>> Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.875566] R13:
>00007ffff5c12740 R14: 0000000000000000 R15: 0000000000000000
>> Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.879490] FS: 
>00007f34087a0740(0000) GS:ffff880137d80000(0000) knlGS:0000000000000000
>> Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.883447] CS:  0010 DS: 0000
>ES: 0000 CR0: 0000000080050033
>> Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.887436] CR2:
>00007f34087c8000 CR3: 0000000107509000 CR4: 00000000000007e0
>> Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.891495] DR0:
>0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.895603] DR3:
>0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.899739] Process pfrsvd
>(pid: 8751, threadinfo ffff880104ea8000, task ffff88012d9e1760)
>> Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.903944]
>> Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.908169] RIP 
>[<00000000004007db>] 0x4007da
>> Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.912447]  RSP
><00007ffff5c12620>
>> Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.943802] ---[ end trace
>1113d12a53145197 ]---
>>
>> Please note the PTE value 80080000000a0225
>>
>> HTH
>>
>> Thank you
>>>> I hope I didn't misunderstand your question.
>>>>
>>>> Thanks
>>>>
>>>>>> on /dev/mem and cause system panic. It's probably not that
>serious, because
>>>>>> access to /dev/mem is limited and the system has to have
>panic_on_oops set, but
>>>>>> still I think we should check this and return error.
>>>>>>
>>>>>> This patch adds check for x86 when ARCH_PHYS_ADDR_T_64BIT is set,
>the same way
>>>>>> as it is already done in e.g. ioremap. With this fix mmap returns
>-EINVAL if the
>>>>>> requested phys addr is bigger then the supported phys addr width.
>>>>>>
>>>>>> [1] Intel 64 and IA-32 Architectures Software Developer's Manual,
>Volume 3A
>>>>>>
>>>>>> Signed-off-by: Frantisek Hrbata <fhrbata@...hat.com>
>>>>>> ---
>>>>>>   arch/x86/include/asm/io.h |  4 ++++
>>>>>>   arch/x86/mm/mmap.c        | 13 +++++++++++++
>>>>>>   2 files changed, 17 insertions(+)
>>>>>>
>>>>>> diff --git a/arch/x86/include/asm/io.h
>b/arch/x86/include/asm/io.h
>>>>>> index d8e8eef..39607c6 100644
>>>>>> --- a/arch/x86/include/asm/io.h
>>>>>> +++ b/arch/x86/include/asm/io.h
>>>>>> @@ -242,6 +242,10 @@ static inline void flush_write_buffers(void)
>>>>>>   #endif
>>>>>>   }
>>>>>> +#define ARCH_HAS_VALID_PHYS_ADDR_RANGE
>>>>>> +extern int valid_phys_addr_range(phys_addr_t addr, size_t
>count);
>>>>>> +extern int valid_mmap_phys_addr_range(unsigned long pfn, size_t
>count);
>>>>>> +
>>>>>>   #endif /* __KERNEL__ */
>>>>>>   extern void native_io_delay(void);
>>>>>> diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
>>>>>> index 845df68..92ec31c 100644
>>>>>> --- a/arch/x86/mm/mmap.c
>>>>>> +++ b/arch/x86/mm/mmap.c
>>>>>> @@ -31,6 +31,8 @@
>>>>>>   #include <linux/sched.h>
>>>>>>   #include <asm/elf.h>
>>>>>> +#include "physaddr.h"
>>>>>> +
>>>>>>   struct __read_mostly va_alignment va_align = {
>>>>>>   	.flags = -1,
>>>>>>   };
>>>>>> @@ -122,3 +124,14 @@ void arch_pick_mmap_layout(struct mm_struct
>*mm)
>>>>>>   		mm->unmap_area = arch_unmap_area_topdown;
>>>>>>   	}
>>>>>>   }
>>>>>> +
>>>>>> +int valid_phys_addr_range(phys_addr_t addr, size_t count)
>>>>>> +{
>>>>>> +	return addr + count <= __pa(high_memory);
>>>>>> +}
>>>>>> +
>>>>>> +int valid_mmap_phys_addr_range(unsigned long pfn, size_t count)
>>>>>> +{
>>>>>> +	resource_size_t addr = (pfn << PAGE_SHIFT) + count;
>>>>>> +	return phys_addr_valid(addr);
>>>>>> +}
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe
>linux-kernel" in
>>>>> the body of a message to majordomo@...r.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>> Please read the FAQ at  http://www.tux.org/lkml/
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe
>linux-kernel" in
>>> the body of a message to majordomo@...r.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/