lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0f66e57a-2fee-c9da-6412-c50bc90eb066@redhat.com>
Date:   Wed, 18 Oct 2017 11:42:13 +0200
From:   Peter Rajnoha <prajnoha@...hat.com>
To:     Greg KH <gregkh@...uxfoundation.org>
Cc:     linux-kernel@...r.kernel.org
Subject: Re: [PATCH] kobject: record and send PREV_SEQNUM with uevents

On 10/18/2017 09:21 AM, Peter Rajnoha wrote:
> On 10/17/2017 03:26 PM, Greg KH wrote:
>> On Tue, Oct 17, 2017 at 12:04:41PM +0200, Peter Rajnoha wrote:
>>> On 10/16/2017 04:49 PM, Greg KH wrote:
>>>> On Mon, Oct 16, 2017 at 04:00:26PM +0200, Peter Rajnoha wrote:
>>>>> On 10/16/2017 03:55 PM, Greg KH wrote:
>>>>>> On Mon, Oct 16, 2017 at 02:35:44PM +0200, Peter Rajnoha wrote:
>>>>>>> Record last uevent seqnum that was used with kobject's uevent and send
>>>>>>> it with next uevent as PREV_SEQNUM=<num> variable.
>>>>>>>
>>>>>>> This enhances the way we are able to track and handle uevents in userspace
>>>>>>> if we are trying to catch up with all the uevents that were generated
>>>>>>> while we were not listening or the uevents which were not delivered.
>>>>>>> This may happen if the userspace listener is not running yet when the
>>>>>>> uevent is triggered or there's a period of time when it's not listening
>>>>>>> (e.g. because the userspace listener is being restarted while a new
>>>>>>> uevent fires).
>>>>>>>
>>>>>>> A userspace listener can compare the last uevent seqnum it knows about
>>>>>>> with the last uevent seqnum the kernel reports through uevents delivered
>>>>>>> to userspace, both genuine and synthetic ones (the ones generated by
>>>>>>> writing uevent action name to /sys/.../uevent file). Then, if needed,
>>>>>>> userspace may execute appropriate actions based on that. We don't need
>>>>>>> to reevaluate and recheck all items, instead, we can concentrate only
>>>>>>> on items for which we really and actually need to reflect changed state
>>>>>>> in kernel while the userspace listener was not listening for uevents.
>>>>>>>
>>>>>>> Signed-off-by: Peter Rajnoha <prajnoha@...hat.com>
>>>>>>> ---
>>>>>>>  include/linux/kobject.h |  1 +
>>>>>>>  lib/kobject.c           |  1 +
>>>>>>>  lib/kobject_uevent.c    | 15 +++++++++++++--
>>>>>>>  3 files changed, 15 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/include/linux/kobject.h b/include/linux/kobject.h
>>>>>>> index e0a6205caa71..ee6db28b6186 100644
>>>>>>> --- a/include/linux/kobject.h
>>>>>>> +++ b/include/linux/kobject.h
>>>>>>> @@ -73,6 +73,7 @@ struct kobject {
>>>>>>>  #ifdef CONFIG_DEBUG_KOBJECT_RELEASE
>>>>>>>  	struct delayed_work	release;
>>>>>>>  #endif
>>>>>>> +	u64			last_uevent_seqnum;
>>>>>>>  	unsigned int state_initialized:1;
>>>>>>>  	unsigned int state_in_sysfs:1;
>>>>>>>  	unsigned int state_add_uevent_sent:1;
>>>>>>> diff --git a/lib/kobject.c b/lib/kobject.c
>>>>>>> index 763d70a18941..2cc809f10ae7 100644
>>>>>>> --- a/lib/kobject.c
>>>>>>> +++ b/lib/kobject.c
>>>>>>> @@ -190,6 +190,7 @@ static void kobject_init_internal(struct kobject *kobj)
>>>>>>>  		return;
>>>>>>>  	kref_init(&kobj->kref);
>>>>>>>  	INIT_LIST_HEAD(&kobj->entry);
>>>>>>> +	kobj->last_uevent_seqnum = 0;
>>>>>>>  	kobj->state_in_sysfs = 0;
>>>>>>>  	kobj->state_add_uevent_sent = 0;
>>>>>>>  	kobj->state_remove_uevent_sent = 0;
>>>>>>> diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
>>>>>>> index f237a09a5862..f1cbed06a395 100644
>>>>>>> --- a/lib/kobject_uevent.c
>>>>>>> +++ b/lib/kobject_uevent.c
>>>>>>> @@ -454,8 +454,19 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
>>>>>>>  	}
>>>>>>>  
>>>>>>>  	mutex_lock(&uevent_sock_mutex);
>>>>>>> -	/* we will send an event, so request a new sequence number */
>>>>>>> -	retval = add_uevent_var(env, "SEQNUM=%llu", (unsigned long long)++uevent_seqnum);
>>>>>>> +	/*
>>>>>>> +	 * We will send an event, so request a new sequence number.
>>>>>>> +	 * Also, record the number for next uevent's PREV_SEQNUM.
>>>>>>> +	 */
>>>>>>> +	kobj->last_uevent_seqnum = ++uevent_seqnum;
>>>>>>> +	retval = add_uevent_var(env, "SEQNUM=%llu",
>>>>>>> +				(unsigned long long)uevent_seqnum);
>>>>>>> +	if (retval) {
>>>>>>> +		mutext_unlock(&uevent_sock_mutex);
>>>>>>> +		goto exit;
>>>>>>> +	}
>>>>>>> +	retval = add_uevent_var(env, "PREV_SEQNUM=%llu",
>>>>>>> +				(unsigned long long)kobj->last_uevent_seqnum);
>>>>>>
>>>>>> I'm confused, why are we storing "last_uevent_seqnum" here at all, if it
>>>>>> is only just the current one-1?  What else do you do with that value?
>>>>> I'm sorry, I've sent incorrect patch by mistake. This is the correct one:
>>>>
>>>> Ok, this is a bit better (but it's not in a format I could apply it in,
>>>> if I wanted to...)
>>>>
>>>>> From 91fa0d0f5c52b959a5ca4cc79d1a4f5970db75e2 Mon Sep 17 00:00:00 2001
>>>>> From: Peter Rajnoha <prajnoha@...hat.com>
>>>>> Date: Thu, 12 Oct 2017 11:33:48 +0200
>>>>> Subject: [PATCH] kobject: record and send PREV_SEQNUM with uevents
>>>>>
>>>>> Record last uevent seqnum that was used with kobject's uevent and send
>>>>> it with next uevent as PREV_SEQNUM=<num> variable.
>>>>>
>>>>> This enhances the way we are able to track and handle uevents in userspace
>>>>> if we are trying to catch up with all the uevents that were generated
>>>>> while we were not listening or the uevents which were not delivered.
>>>>> This may happen if the userspace listener is not running yet when the
>>>>> uevent is triggered or there's a period of time when it's not listening
>>>>> (e.g. because the userspace listener is being restarted while a new
>>>>> uevent fires).
>>>>>
>>>>> A userspace listener can compare the last uevent seqnum it knows about
>>>>> with the last uevent seqnum the kernel reports through uevents delivered
>>>>> to userspace, both genuine and synthetic ones (the ones generated by
>>>>> writing uevent action name to /sys/.../uevent file). Then, if needed,
>>>>> userspace may execute appropriate actions based on that. We don't need
>>>>> to reevaluate and recheck all items, instead, we can concentrate only
>>>>> on items for which we really and actually need to reflect changed state
>>>>> in kernel while the userspace listener was not listening for uevents.
>>>>>
>>>>> Signed-off-by: Peter Rajnoha <prajnoha@...hat.com>
>>>>
>>>> I don't really understand what userspace can do with this.  Why does it
>>>> matter what the sequence number was for a specific kobject?  What can
>>>> happen because of this?
>>>>
>>>> Do you have any example libudev code changes to show this in action so I
>>>> can maybe understand the need?
>>>>
>>>
>>> Not a libudev code change for this, at least I don't have this in plan
>>> at this moment.
>>>
>>> It's more related to the way the udev rules are processed and, in
>>> general, the way any other uevent listener can react to uevents. So the
>>> change is in how "end consumer" of the uevent does its processing.
>>>
>>> At the moment, the sequence number is used globally, we don't have
>>> per-kobject sequence numbers. So we can't use current SEQNUM that is
>>> attached to each uevent to tell whether we have missed previous
>>> uevent(s) for concrete device or not, we just know that some uevents for
>>> some devices were not processed. Therefore, when we're triggering
>>> uevents from userspace (udevadm trigger), we need to reevaluate all the
>>> rules for all devices, no matter if it's really needed or not.
>>>
>>> For example, when we're switching from initramfs to root fs, we stop the
>>> first udevd instance which is in initramfs, we keep the database (at the
>>> moment, only for selected devices where we know that it's safe to do so
>>> - same rules used in initramfs as in root fs). Then we switch over to
>>> root fs and start the udevd again and we trigger uevents to fully
>>> refresh the state, taking also the existing database into account.
>>>
>>> Now, it would help if we knew whether there were any uevents issued
>>> after we stopped udevd in initramfs and before we started a new one from
>>> root fs. If there are no uevents in between, we don't need to reevaluate
>>> all the rules, we can just keep the existing database content. This can
>>> become prominent if we have more devices in the system. It helps to
>>> minimize the number of execution during boot, the overall impact of
>>> triggering uevents and related processing, hence also making it more
>>> straightforward to watch the state of the device and possibly debug
>>> related problems.
>>>
>>> The same principle applies to any uevent listener in userspace, not just
>>> udevd:
>>>
>>>   - recording last seen SEQNUM
>>>
>>>   - when new instance of the listener is running
>>>
>>>     - trigger synthetic uevents to refresh state
>>>
>>>     - comparing the PREV_SEQNUM from uevent generated with stored SEQNUM:
>>
>> Why can't you just compare SEQNUM with the "last seen" SEQNUM, they
>> should just be 1 apart, right?  Isn't that what is done today?  How will
>> this differ from looking at PREV_SEQNUM?
> 
> The SEQNUM is global, not per device (per kobject). It's 1 apart if you
> have only 1 device, but that doesn't apply if you have more devices with
> possible uevents interleaved.
> 
> For example (just provoking uevents here for clear example):
> 
>   echo add > /sys/block/sda/uevent
>   echo add > /sys/block/sda/uevent
>   echo add > /sys/block/sdb/uevent
>   echo add > /sys/block/sda/uevent
> 
> Generates these uevents (I've removed uninteresting parts):
> 
>   KERNEL[45.739557] add
>   ACTION=add
>   DEVNAME=/dev/sda
>   PREV_SEQNUM=2096
>   SEQNUM=2112
> 
>   KERNEL[55.131767] add
>   ACTION=add
>   DEVNAME=/dev/sda
>   PREV_SEQNUM=2112
>   SEQNUM=2113
>   SUBSYSTEM=block
> 
>   KERNEL[59.236131] add
>   ACTION=add
>   DEVNAME=/dev/sdb
>   PREV_SEQNUM=2097
>   SEQNUM=2114
> 
>   KERNEL[61.708551] add
>   ACTION=add
>   DEVNAME=/dev/sda
>   PREV_SEQNUM=2113
>   SEQNUM=2115
> 
> Here, the PREV_SEQNUM is 1 apart with SEQNUM for the 2nd uevent (sda),
> because there was no other uevent (for a different device). But if you
> check the 4th uevent (sda) where previous one was generated for another
> device (sdb), the PREV_SEQNUM is not 1 apart. So in general, it may be
> "n" apart from previous SEQNUM for a *particular device*. Globally, for
> all devices, it is 1 apart, yes, but we don't care about that as that is
> not quite useful.
> 
> So without PREV_SEQNUM, we can detect we have missed uevents, but we
> don't know for which devices exactly, so we need to refresh everything,
> not just the items we actually need. With PREV_SEQNUM we can identify
> exact devices.
Also, without PREV_SEQNUM, to tell at least whether I have missed some
uevents, I need to check /sys/kernel/uevent_seqnum and then compare with
what I have in userspace as "last seen seqnum". But that is a race -
after reading the uevent_seqnum file, more uevents may trigger...

I wouldn't need to read the uevent_seqnum file though if I just read the
SEQNUM directly from the uevent. But then, if I detected the SEQNUM is
not good (it's not the "last know seqnum" + 1), I'd need to trigger the
full refresh for all devices while processing that one particular
uevent, and that's becoming messy.

While with the PREV_SEQNUM, we don't need to care, we just trigger the
synthetic uevents whenever we need and we let the check to be done *per
each device* while processing those triggered uevents:

  - if PREV_SEQNUM matches with what we have in records as "last seen"
for that particular device, we skip the uevent (we keep the old records
as they don't need any refresh)

 - if PREV_SEQNUM doesn't match, we refresh ONLY that one device

So it's also cleaner way to deal with refreshes, besides the fact it's
done per-device, not the "all devices or nothing" like it is with having
only the global seqnum view.

-- 
Peter

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ