linux-kernel - Re: [PATCH 16/19] x86, mpx: support 32-bit binaries on 64-bit kernel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <555A75F5.1000503@sr71.net>
Date:	Mon, 18 May 2015 16:29:57 -0700
From:	Dave Hansen <dave@...1.net>
To:	Thomas Gleixner <tglx@...utronix.de>
CC:	linux-kernel@...r.kernel.org, x86@...nel.org,
	dave.hansen@...ux.intel.com
Subject: Re: [PATCH 16/19] x86, mpx: support 32-bit binaries on 64-bit kernel

On 05/18/2015 02:53 PM, Thomas Gleixner wrote:
> On Fri, 8 May 2015, Dave Hansen wrote:
> 
>>
>> From: Dave Hansen <dave.hansen@...ux.intel.com>
>>
>> Changes from v20:
>>
>>  * Fix macro confusion between BD and BT
>>  * Add accessor for bt_entry_size_bytes()
> 
> Forgot to say this earlier. Please put the changes after the changelog
> itself, i.e. after the '---'
> 
>> -#ifdef CONFIG_X86_64
>> -
>> -/* upper 28 bits [47:20] of the virtual address in 64-bit used to
>> - * index into bounds directory (BD).
>> +/*
>> + * The upper 28 bits [47:20] of the virtual address in 64-bit
>> + * are used to index into bounds directory (BD).
>> + *
>> + * The directory is 2G (2^31) in size, and with 8-byte entries
>> + * it has 2^28 entries.
>>   */
>> -#define MPX_BD_ENTRY_OFFSET	28
>> -#define MPX_BD_ENTRY_SHIFT	3
>> -/* bits [19:3] of the virtual address in 64-bit used to index into
>> - * bounds table (BT).
>> +#define MPX_BD_SIZE_BYTES_64	(1UL<<31)
>> +/* An entry is a long, so 8 bytes and a shift of 3 */
> 
> I can see the 8 bytes, but where is the shift constant?

The comment is old.  I'll zap it.

>> +static inline int bt_entry_size_bytes(struct mm_struct *mm)
>> +{
>> +	if (is_64bit_mm(mm))
>> +		return MPX_BT_ENTRY_BYTES_64;
>> +	else
>> +		return MPX_BT_ENTRY_BYTES_32;
>> +}
>> +
>> +/*
>> + * Take a virtual address and turns it in to the offset in bytes
>> + * inside of the bounds table where the bounds table entry
>> + * controlling 'addr' can be found.
>> + */
>> +static unsigned long mpx_get_bt_entry_offset_bytes(struct mm_struct *mm,
>> +		unsigned long addr)
>> +{
>> +	unsigned long bt_table_nr_entries;
>> +	unsigned long offset = addr;
>> +
>> +	if (is_64bit_mm(mm)) {
>> +		/* Bottom 3 bits are ignored on 64-bit */
>> +		offset >>= 3;
>> +		bt_table_nr_entries = MPX_BT_NR_ENTRIES_64;
>> +	} else {
>> +		/* Bottom 2 bits are ignored on 32-bit */
>> +		offset >>= 2;
>> +		bt_table_nr_entries = MPX_BT_NR_ENTRIES_32;
>> +	}
>> +	/*
>> +	 * We know the size of the table in to which we are
>> +	 * indexing, and we have eliminated all the low bits
>> +	 * which are ignored for indexing.
>> +	 *
>> +	 * Mask out all the high bits which we do not need
>> +	 * to index in to the table.
>> +	 */
>> +	offset &= (bt_table_nr_entries-1);
> 
>   	       ....  entries - 1); 
> 
> And you might explain why nr_entries - 1 is a proper mask,
> i.e. nr_entries is always a power of 2.
> 
>> +	/*
>> +	 * We now have an entry offset in terms of *entries* in
>> +	 * the table.  We need to scale it back up to bytes.
>> +	 */
>> +	offset *= bt_entry_size_bytes(mm);
> 
> You could store the scale value out in the if () construct above, but I
> guess the compiler can figure that out as well :)
> 
>> +	return offset;
>> +}
>> +
>> +/*
>> + * Total size of the process's virtual address space
>> + * Use a u64 because 4GB (for 32-bit) won't fit in a long.
>> + *
>> + * __VIRTUAL_MASK does not work here.  It only covers the
>> + * user address space and the tables cover the *entire*
>> + * virtual address space supported on the CPU.
>> + */
>> +static inline unsigned long long mm_virt_space(struct mm_struct *mm)
>> +{
>> +	if (is_64bit_mm(mm))
>> +		return 1ULL << 48;
> 
> cpu_info->x86_phys_bits will tell you the proper value
> 
>> +	else
>> +		return 1ULL << 32;
> 
> And for a 32bit kernel 32 might be wrong because with PAE you have 36
> bits.

That's physical space.  I really do need virtual space here.

But your comments stand for ->x86_virt_bits.  I'll fix.

>> +static unsigned long mpx_get_bd_entry_offset(struct mm_struct *mm,
>> +		unsigned long addr)
>> +{
>> +	/*
>> +	 * There are several ways to derive the bd offsets.  We
>> +	 * use the following approach here:
>> +	 * 1. We know the size of the virtual address space
>> +	 * 2. We know the number of entries in a bounds table
>> +	 * 3. We know that each entry covers a fixed amount of
>> +	 *    virtual address space.
>> +	 * So, we can just divide the virtual address by the
>> +	 * virtual space used by one entry to determine which
>> +	 * entry "controls" the given virtual address.
>> +	 */
>> +	if (is_64bit_mm(mm)) {
>> +		int bd_entry_size = 8; /* 64-bit pointer */
>> +		/*
>> +		 * Take the 64-bit addressing hole in to account.
>> +		 * This is a noop on 32-bit since it has no hole.
> 
> But a 32bit kernel will not take this code path because
> is_64bit_mm(mm) evaluates to false.

I meant that in case someone wondered why I didn't have that code in the
32-bit version.  I'll move the comment.

>> +		 */
>> +		addr &= ~(mm_virt_space(mm) - 1);
>> +		return (addr / bd_entry_virt_space(mm)) * bd_entry_size;
>> +	} else {
>> +		int bd_entry_size = 4; /* 32-bit pointer */
>> +		return (addr / bd_entry_virt_space(mm)) * bd_entry_size;
>> +	}
>> +	/*
>> +	 * The two return calls above are exact copies.  If we
>> +	 * pull out a single copy and put it in here, gcc won't
>> +	 * realize that we're doing a power-of-2 divide and use
>> +	 * shifts.  It uses a real divide.  If we put them up
>> +	 * there, it manages to figure it out (gcc 4.8.3).
> 
> Can't we provide the shift values from bd_entry_virt_space() so we
> don't have to worry about gcc versions being more or less clever?

Yes, I could go back and rework all the math to be done with shifts
instead of power-of-2 divides (which is what was done before).  But,
it's very clear the way that it stands, minus this wart.  The code look
a *lot* better this way.

This isn't super performance-sensitive code.  It's basically in the
munmap() path.  I just really didn't like the idea of an actual integer
divide in there.

So, if GCC breaks this, so be it.  I don't think we'll ever notice.  The
optimization was just too obvious to completely ignore.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/