lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4576b1fc-f56f-417b-b38a-c7cedc36cd15@rivosinc.com>
Date:   Mon, 25 Sep 2023 20:22:08 +0200
From:   Clément Léger <cleger@...osinc.com>
To:     Beau Belgrave <beaub@...ux.microsoft.com>
Cc:     Steven Rostedt <rostedt@...dmis.org>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org
Subject: Re: [PATCH] tracing/user_events: align uaddr on unsigned long
 alignment



On 25/09/2023 20:04, Clément Léger wrote:
> 
> 
> On 25/09/2023 18:04, Beau Belgrave wrote:
>> On Mon, Sep 25, 2023 at 09:53:16AM +0200, Clément Léger wrote:
>>>
>>>
>>> On 22/09/2023 21:22, Beau Belgrave wrote:
>>>> On Tue, Sep 19, 2023 at 02:59:12PM +0200, Clément Léger wrote:
>>>>>
>>>>>
>>>>> On 14/09/2023 19:29, Steven Rostedt wrote:
>>>>>> On Thu, 14 Sep 2023 13:17:00 -0400
>>>>>> Steven Rostedt <rostedt@...dmis.org> wrote:
>>>>>>
>>>>>>> Now lets look at big endian layout:
>>>>>>>
>>>>>>>  uaddr = 0xbeef0004
>>>>>>>  enabler = 1;
>>>>>>>
>>>>>>>  memory at 0xbeef0000:  00 00 00 00 00 00 00 02
>>>>>>>                                     ^
>>>>>>>                                     addr: 0xbeef0004
>>>>>>>
>>>>>>> 				(enabler is set )
>>>>>>>
>>>>>>> 	bitoffset = uaddr & (sizeof(unsigned long) - 1); bitoffset = 4
>>>>>>> 	bit_offset *= 8;				 bitoffset = 32
>>>>>>> 	uaddr &= ~(sizeof(unsigned long) - 1);		 uaddr = 0xbeef0000
>>>>>>>
>>>>>>> 	ptr = kaddr + (uaddr & ~PAGE_MASK);
>>>>>>>
>>>>>>> 	clear_bit(1 + 32, ptr);
>>>>>>>
>>>>>>>  memory at 0xbeef0000:  00 00 00 00 00 00 00 02
>>>>>>>                                   ^
>>>>>>> 				bit 33 of 0xbeef0000
>>>>>>>
>>>>>>> I don't think that's what you expected!
>>>>>>
>>>>>> I believe the above can be fixed with:
>>>>>>
>>>>>> 	bit_offset = uaddr & (sizeof(unsigned long) - 1);
>>>>>> 	if (bit_offset) {
>>>>>> #ifdef CONFIG_CPU_BIG_ENDIAN
>>>>>> 		bit_offest = 0;
>>>>>> #else
>>>>>> 		bit_offset *= BITS_PER_BYTE;
>>>>>> #endif
>>>>>> 		uaddr &= ~(sizeof(unsigned long) - 1);
>>>>>> 	}
>>>>>>
>>>>>> -- Steve
>>>>>
>>>>>
>>>>> Actually, after looking more in depth at that, it seems like there are
>>>>> actually 2 problems that can happen.
>>>>>
>>>>> First one is atomic access misalignment due to enable_size == 4 and
>>>>> uaddr not being aligned on a (long) boundary on 64 bits architecture.
>>>>> This can generate misaligned exceptions on various architectures. This
>>>>> can be fixed in a more general way according to Masami snippet.
>>>>>
>>>>> Second one that I can see is on 64 bits, big endian architectures with
>>>>> enable_size == 4. In that case, the bit provided by the userspace won't
>>>>> be correctly set since this code kind of assume that the atomic are done
>>>>> on 32bits value. Since the kernel assume long sized atomic operation, on
>>>>> big endian 64 bits architecture, the updated bit will actually be in the
>>>>> next 32 bits word.
>>>>>
>>>>> Can someone confirm my understanding ?
>>>>>
>>>>
>>>> I have a ppc 64bit BE VM I've been validating this on. If we do the
>>>> shifting within user_events (vs a generic set_bit_aligned approach)
>>>> 64bit BE does not need additional bit manipulation. However, if we were
>>>> to blindly pass the address and bit as is to set_bit_aligned() it
>>>> assumes the bit number is for a long, not a 32 bit word. So for that
>>>> approach we would need to offset the bit in the unaligned case.
>>>>
>>>> Here's a patch I have that I've validated on ppc64 BE, aarch64 LE, and
>>>> x86_64 LE. I personally feel more comfortable with this approach than
>>>> the generic set_bit_aligned() one.
>>>>
>>>> Thanks,
>>>> -Beau
>>>>
>>>> diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c
>>>> index e3f2b8d72e01..ae854374d0b7 100644
>>>> --- a/kernel/trace/trace_events_user.c
>>>> +++ b/kernel/trace/trace_events_user.c
>>>> @@ -162,6 +162,23 @@ struct user_event_validator {
>>>>  	int			flags;
>>>>  };
>>>>  
>>>> +static inline void align_addr_bit(unsigned long *addr, int *bit)
>>>> +{
>>>> +	if (IS_ALIGNED(*addr, sizeof(long)))
>>>> +		return;
>>>> +
>>>> +	*addr = ALIGN_DOWN(*addr, sizeof(long));
>>>> +
>>>> +	/*
>>>> +	 * We only support 32 and 64 bit values. The only time we need
>>>> +	 * to align is a 32 bit value on a 64 bit kernel, which on LE
>>>> +	 * is always 32 bits, and on BE requires no change.
>>>> +	 */
>>>> +#ifdef __LITTLE_ENDIAN
>>>> +	*bit += 32;
>>>> +#endif
>>>
>>> Hi Beau, except the specific alignment that is basically what I ended up
>>> with for the BE 64bit case (ie just bit += 32). Regarding the generic
>>> alignment, depends on what the maintainers wishes (generic of user_event
>>> specific). I also feel like this shoulmd be handle specifically for
>>> user_events which uses set_bit in some non standard way. Any suggestion ?
>>>
>>
>> Looking at this deeper, user_events needs to track some of this
>> alignment requirements within each enabler. This is needed because when
>> enablers are disabled/removed they do not have the actual size. The size
>> is needed to know if we need to update the bits, etc. (IE: If originally
>> a 32-bit value was used and it's aligned and it's on BE, it needs the
>> bits shifted.)
>>
>> My final version that I played around with looked like this for just the
>> alignment requirements:
>> +static inline void align_addr_bit(unsigned long *addr, int *bit,
>> +                                 unsigned long *flags)
>> +{
>> +       if (IS_ALIGNED(*addr, sizeof(long))) {
>> +#ifdef __BIG_ENDIAN
>> +               if (test_bit(ENABLE_VAL_32_BIT, flags))
>> +                       *bit += 32;
>> +#endif
>> +               return;
>> +       }
>> +
>> +       *addr = ALIGN_DOWN(*addr, sizeof(long));
>> +
>> +       /*
>> +        * We only support 32 and 64 bit values. The only time we need
>> +        * to align is a 32 bit value on a 64 bit kernel, which on LE
>> +        * is always 32 bits, and on BE requires no change.
>> +        */
>> +#ifdef __LITTLE_ENDIAN
>> +       *bit += 32;
>> +#endif
>> +}
>>
>> Below is the full patch, which is currently appearing to pass everything
>> on BE (Please note, the abi_test in selftests needs fixes that I plan to
>> push out. I'd like to get Steven to take those changes along with yours
>> together from tracing to ensure we can test BE with these changes
>> properly).
>>
>> As you'll see, it requires a bit more work than just a generic unaligned
>> solution due to the bits having to move for 32-bit on BE and the
>> tracking requirement on when to do so during delete/clear.
> 
> I actually had the same kind of handling (using a size field rather than
> a flag though). However, the generic set_bit/clear bit requires a
> different handling of the 32 bits on BE 64 bits which does not sounds
> quite right, ie unconditionally add 32 bits to fixup the offset which is
>  implicit in case you are not aligned with your code (ie shifting the
> address actually give the correct bit on BE 64 bits).
> 
> Since you already wrote the code, I think we can proceed with your
> version as I actually thinks its more clearer to understand.

FWIW, here is my series, as I said, the generic set/clear bit makes it
less clear in the BE 64 bits case:


diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 2ba557e067fe..b247c24695b9 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -290,6 +290,65 @@ static __always_inline void __assign_bit(long nr,
volatile unsigned long *addr,
 		__clear_bit(nr, addr);
 }

+#define LONG_ALIGN_DIFF(p)	((unsigned long)(p)) & (sizeof(long) -1)
+#define LONG_ALIGNED_PTR(p) \
+		(unsigned long *)(((unsigned long)(p)) & ~(sizeof(long) - 1))
+
+static inline unsigned long *__fix_unaligned(int *bit, void *ptr)
+{
+	int offs = LONG_ALIGN_DIFF(ptr) * BITS_PER_BYTE;
+	unsigned long *uptr;
+
+	if (offs == 0)
+		return uptr;
+
+	uptr = LONG_ALIGNED_PTR(ptr);
+
+#ifdef __BIG_ENDIAN
+	if (*bit >= offs)
+		*bit -= offs;
+	else
+		*bit += BITS_PER_LONG + offs;
+#else
+	*bit += offs;
+#endif
+
+	return uptr;
+}
+
+/**
+ * set_bit_unaligned - Atomically set a bit in memory starting from an
unaligned
+ *		       pointer
+ * @nr: the bit to set
+ * @addr: the address to start counting from
+ *
+ * This is a relaxed atomic operation (no implied memory barriers).
+ *
+ * Note that @nr may be almost arbitrarily large; this function is not
+ * restricted to acting on a single-word quantity.
+ */
+static inline void set_bit_unaligned(int bit, void *ptr)
+{
+	unsigned long *uptr = __fix_unaligned(&bit, ptr);
+
+	set_bit(bit, uptr);
+}
+
+/**
+ * clear_bit_unaligned - Clears a bit in memory starting from an unaligned
+ *			 pointer
+ * @nr: Bit to clear
+ * @addr: Address to start counting from
+ *
+ * This is a relaxed atomic operation (no implied memory barriers).
+ */
+static inline void clear_bit_unaligned(int bit, unsigned long *ptr)
+{
+	unsigned long *uptr = __fix_unaligned(&bit, ptr);
+
+	clear_bit(bit, uptr);
+}
+
 /**
  * __ptr_set_bit - Set bit in a pointer's value
  * @nr: the bit to set
diff --git a/kernel/trace/trace_events_user.c
b/kernel/trace/trace_events_user.c
index 6f046650e527..641ab6dedb30 100644
--- a/kernel/trace/trace_events_user.c
+++ b/kernel/trace/trace_events_user.c
@@ -113,6 +113,7 @@ struct user_event_enabler {
 	struct list_head	mm_enablers_link;
 	struct user_event	*event;
 	unsigned long		addr;
+	u8			size;

 	/* Track enable bit, flags, etc. Aligned for bitops. */
 	unsigned long		values;
@@ -481,9 +482,15 @@ static int user_event_enabler_write(struct
user_event_mm *mm,
 	unsigned long uaddr = enabler->addr;
 	unsigned long *ptr;
 	struct page *page;
+	int bit = ENABLER_BIT(enabler);
 	void *kaddr;
 	int ret;

+#if defined(CONFIG_CPU_BIG_ENDIAN) && defined(CONFIG_64BIT)
+	if (enabler->size == 4)
+		bit += 32;
+#endif
+
 	lockdep_assert_held(&event_mutex);
 	mmap_assert_locked(mm->mm);

@@ -515,9 +522,9 @@ static int user_event_enabler_write(struct
user_event_mm *mm,

 	/* Update bit atomically, user tracers must be atomic as well */
 	if (enabler->event && enabler->event->status)
-		set_bit(ENABLE_BIT(enabler), ptr);
+		set_bit_unaligned(bit, ptr);
 	else
-		clear_bit(ENABLE_BIT(enabler), ptr);
+		clear_bit_unaligned(bit, ptr);

 	kunmap_local(kaddr);
 	unpin_user_pages_dirty_lock(&page, 1, true);
@@ -849,6 +856,7 @@ static struct user_event_enabler
 	enabler->event = user;
 	enabler->addr = uaddr;
 	enabler->values = reg->enable_bit;
+	enabler->size = reg->enable_size;
 retry:
 	/* Prevents state changes from racing with new enablers */
 	mutex_lock(&event_mutex);
@@ -2377,7 +2385,8 @@ static long user_unreg_get(struct user_unreg
__user *ureg,
 }

 static int user_event_mm_clear_bit(struct user_event_mm *user_mm,
-				   unsigned long uaddr, unsigned char bit)
+				   unsigned long uaddr, unsigned char bit,
+				   u8 size)
 {
 	struct user_event_enabler enabler;
 	int result;
@@ -2386,6 +2395,7 @@ static int user_event_mm_clear_bit(struct
user_event_mm *user_mm,
 	memset(&enabler, 0, sizeof(enabler));
 	enabler.addr = uaddr;
 	enabler.values = bit;
+	enabler.size = size;
 retry:
 	/* Prevents state changes from racing with new enablers */
 	mutex_lock(&event_mutex);
@@ -2416,6 +2426,7 @@ static long user_events_ioctl_unreg(unsigned long
uarg)
 	struct user_event_enabler *enabler, *next;
 	struct user_unreg reg;
 	long ret;
+	u8 size;

 	ret = user_unreg_get(ureg, &reg);

@@ -2441,6 +2452,8 @@ static long user_events_ioctl_unreg(unsigned long
uarg)
 		    ENABLE_BIT(enabler) == reg.disable_bit) {
 			set_bit(ENABLE_VAL_FREEING_BIT, ENABLE_BITOPS(enabler));

+			size = enabler->size;
+
 			if (!test_bit(ENABLE_VAL_FAULTING_BIT, ENABLE_BITOPS(enabler)))
 				user_event_enabler_destroy(enabler, true);

@@ -2454,7 +2467,7 @@ static long user_events_ioctl_unreg(unsigned long
uarg)
 	/* Ensure bit is now cleared for user, regardless of event status */
 	if (!ret)
 		ret = user_event_mm_clear_bit(mm, reg.disable_addr,
-					      reg.disable_bit);
+					      reg.disable_bit, size);

 	return ret;
 }

--

Clément

> 
> Thanks,
> 
> Clément
> 
>>
>> Thanks,
>> -Beau
>>
>> diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c
>> index 6f046650e527..b05db15eaea6 100644
>> --- a/kernel/trace/trace_events_user.c
>> +++ b/kernel/trace/trace_events_user.c
>> @@ -127,8 +127,11 @@ struct user_event_enabler {
>>  /* Bit 7 is for freeing status of enablement */
>>  #define ENABLE_VAL_FREEING_BIT 7
>>  
>> +/* Bit 8 is for marking 32-bit on 64-bit */
>> +#define ENABLE_VAL_32_BIT 8
>> +
>>  /* Only duplicate the bit value */
>> -#define ENABLE_VAL_DUP_MASK ENABLE_VAL_BIT_MASK
>> +#define ENABLE_VAL_DUP_MASK (ENABLE_VAL_BIT_MASK | 1 << ENABLE_VAL_32_BIT)
>>  
>>  #define ENABLE_BITOPS(e) (&(e)->values)
>>  
>> @@ -174,6 +177,29 @@ struct user_event_validator {
>>  	int			flags;
>>  };
>>  
>> +static inline void align_addr_bit(unsigned long *addr, int *bit,
>> +				  unsigned long *flags)
>> +{
>> +	if (IS_ALIGNED(*addr, sizeof(long))) {
>> +#ifdef __BIG_ENDIAN
>> +		if (test_bit(ENABLE_VAL_32_BIT, flags))
>> +			*bit += 32;
>> +#endif
>> +		return;
>> +	}
>> +
>> +	*addr = ALIGN_DOWN(*addr, sizeof(long));
>> +
>> +	/*
>> +	 * We only support 32 and 64 bit values. The only time we need
>> +	 * to align is a 32 bit value on a 64 bit kernel, which on LE
>> +	 * is always 32 bits, and on BE requires no change.
>> +	 */
>> +#ifdef __LITTLE_ENDIAN
>> +	*bit += 32;
>> +#endif
>> +}
>> +
>>  typedef void (*user_event_func_t) (struct user_event *user, struct iov_iter *i,
>>  				   void *tpdata, bool *faulted);
>>  
>> @@ -482,6 +508,7 @@ static int user_event_enabler_write(struct user_event_mm *mm,
>>  	unsigned long *ptr;
>>  	struct page *page;
>>  	void *kaddr;
>> +	int bit = ENABLE_BIT(enabler);
>>  	int ret;
>>  
>>  	lockdep_assert_held(&event_mutex);
>> @@ -497,6 +524,8 @@ static int user_event_enabler_write(struct user_event_mm *mm,
>>  		     test_bit(ENABLE_VAL_FREEING_BIT, ENABLE_BITOPS(enabler))))
>>  		return -EBUSY;
>>  
>> +	align_addr_bit(&uaddr, &bit, ENABLE_BITOPS(enabler));
>> +
>>  	ret = pin_user_pages_remote(mm->mm, uaddr, 1, FOLL_WRITE | FOLL_NOFAULT,
>>  				    &page, NULL);
>>  
>> @@ -515,9 +544,9 @@ static int user_event_enabler_write(struct user_event_mm *mm,
>>  
>>  	/* Update bit atomically, user tracers must be atomic as well */
>>  	if (enabler->event && enabler->event->status)
>> -		set_bit(ENABLE_BIT(enabler), ptr);
>> +		set_bit(bit, ptr);
>>  	else
>> -		clear_bit(ENABLE_BIT(enabler), ptr);
>> +		clear_bit(bit, ptr);
>>  
>>  	kunmap_local(kaddr);
>>  	unpin_user_pages_dirty_lock(&page, 1, true);
>> @@ -849,6 +878,12 @@ static struct user_event_enabler
>>  	enabler->event = user;
>>  	enabler->addr = uaddr;
>>  	enabler->values = reg->enable_bit;
>> +
>> +#if BITS_PER_LONG >= 64
>> +	if (reg->enable_size == 4)
>> +		set_bit(ENABLE_VAL_32_BIT, ENABLE_BITOPS(enabler));
>> +#endif
>> +
>>  retry:
>>  	/* Prevents state changes from racing with new enablers */
>>  	mutex_lock(&event_mutex);
>> @@ -2377,7 +2412,8 @@ static long user_unreg_get(struct user_unreg __user *ureg,
>>  }
>>  
>>  static int user_event_mm_clear_bit(struct user_event_mm *user_mm,
>> -				   unsigned long uaddr, unsigned char bit)
>> +				   unsigned long uaddr, unsigned char bit,
>> +				   unsigned long flags)
>>  {
>>  	struct user_event_enabler enabler;
>>  	int result;
>> @@ -2385,7 +2421,7 @@ static int user_event_mm_clear_bit(struct user_event_mm *user_mm,
>>  
>>  	memset(&enabler, 0, sizeof(enabler));
>>  	enabler.addr = uaddr;
>> -	enabler.values = bit;
>> +	enabler.values = bit | flags;
>>  retry:
>>  	/* Prevents state changes from racing with new enablers */
>>  	mutex_lock(&event_mutex);
>> @@ -2415,6 +2451,7 @@ static long user_events_ioctl_unreg(unsigned long uarg)
>>  	struct user_event_mm *mm = current->user_event_mm;
>>  	struct user_event_enabler *enabler, *next;
>>  	struct user_unreg reg;
>> +	unsigned long flags;
>>  	long ret;
>>  
>>  	ret = user_unreg_get(ureg, &reg);
>> @@ -2425,6 +2462,7 @@ static long user_events_ioctl_unreg(unsigned long uarg)
>>  	if (!mm)
>>  		return -ENOENT;
>>  
>> +	flags = 0;
>>  	ret = -ENOENT;
>>  
>>  	/*
>> @@ -2441,6 +2479,10 @@ static long user_events_ioctl_unreg(unsigned long uarg)
>>  		    ENABLE_BIT(enabler) == reg.disable_bit) {
>>  			set_bit(ENABLE_VAL_FREEING_BIT, ENABLE_BITOPS(enabler));
>>  
>> +			/* We must keep compat flags for the clear */
>> +			if (test_bit(ENABLE_VAL_32_BIT, ENABLE_BITOPS(enabler)))
>> +				flags |= 1 << ENABLE_VAL_32_BIT;
>> +
>>  			if (!test_bit(ENABLE_VAL_FAULTING_BIT, ENABLE_BITOPS(enabler)))
>>  				user_event_enabler_destroy(enabler, true);
>>  
>> @@ -2454,7 +2496,7 @@ static long user_events_ioctl_unreg(unsigned long uarg)
>>  	/* Ensure bit is now cleared for user, regardless of event status */
>>  	if (!ret)
>>  		ret = user_event_mm_clear_bit(mm, reg.disable_addr,
>> -					      reg.disable_bit);
>> +					      reg.disable_bit, flags);
>>  
>>  	return ret;
>>  }

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ