[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <554B5329.8040907@samsung.com>
Date: Thu, 07 May 2015 13:57:29 +0200
From: Beata Michalska <b.michalska@...sung.com>
To: Greg KH <greg@...ah.com>
Cc: Jan Kara <jack@...e.cz>, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-api@...r.kernel.org,
tytso@....edu, adilger.kernel@...ger.ca, hughd@...gle.com,
lczerner@...hat.com, hch@...radead.org, linux-ext4@...r.kernel.org,
linux-mm@...ck.org, kyungmin.park@...sung.com, kmpark@...radead.org
Subject: Re: [RFC v2 1/4] fs: Add generic file system event notifications
Hi,
On 05/05/2015 02:16 PM, Beata Michalska wrote:
> Hi again,
>
> On 04/29/2015 11:13 AM, Greg KH wrote:
>> On Wed, Apr 29, 2015 at 09:42:59AM +0200, Jan Kara wrote:
>>> On Wed 29-04-15 09:03:08, Beata Michalska wrote:
>>>> On 04/28/2015 07:39 PM, Greg KH wrote:
>>>>> On Tue, Apr 28, 2015 at 04:46:46PM +0200, Beata Michalska wrote:
>>>>>> On 04/28/2015 04:09 PM, Greg KH wrote:
>>>>>>> On Tue, Apr 28, 2015 at 03:56:53PM +0200, Jan Kara wrote:
>>>>>>>> On Mon 27-04-15 17:37:11, Greg KH wrote:
>>>>>>>>> On Mon, Apr 27, 2015 at 05:08:27PM +0200, Beata Michalska wrote:
>>>>>>>>>> On 04/27/2015 04:24 PM, Greg KH wrote:
>>>>>>>>>>> On Mon, Apr 27, 2015 at 01:51:41PM +0200, Beata Michalska wrote:
>>>>>>>>>>>> Introduce configurable generic interface for file
>>>>>>>>>>>> system-wide event notifications, to provide file
>>>>>>>>>>>> systems with a common way of reporting any potential
>>>>>>>>>>>> issues as they emerge.
>>>>>>>>>>>>
>>>>>>>>>>>> The notifications are to be issued through generic
>>>>>>>>>>>> netlink interface by newly introduced multicast group.
>>>>>>>>>>>>
>>>>>>>>>>>> Threshold notifications have been included, allowing
>>>>>>>>>>>> triggering an event whenever the amount of free space drops
>>>>>>>>>>>> below a certain level - or levels to be more precise as two
>>>>>>>>>>>> of them are being supported: the lower and the upper range.
>>>>>>>>>>>> The notifications work both ways: once the threshold level
>>>>>>>>>>>> has been reached, an event shall be generated whenever
>>>>>>>>>>>> the number of available blocks goes up again re-activating
>>>>>>>>>>>> the threshold.
>>>>>>>>>>>>
>>>>>>>>>>>> The interface has been exposed through a vfs. Once mounted,
>>>>>>>>>>>> it serves as an entry point for the set-up where one can
>>>>>>>>>>>> register for particular file system events.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Beata Michalska <b.michalska@...sung.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>> Documentation/filesystems/events.txt | 231 ++++++++++
>>>>>>>>>>>> fs/Makefile | 1 +
>>>>>>>>>>>> fs/events/Makefile | 6 +
>>>>>>>>>>>> fs/events/fs_event.c | 770 ++++++++++++++++++++++++++++++++++
>>>>>>>>>>>> fs/events/fs_event.h | 25 ++
>>>>>>>>>>>> fs/events/fs_event_netlink.c | 99 +++++
>>>>>>>>>>>> fs/namespace.c | 1 +
>>>>>>>>>>>> include/linux/fs.h | 6 +-
>>>>>>>>>>>> include/linux/fs_event.h | 58 +++
>>>>>>>>>>>> include/uapi/linux/fs_event.h | 54 +++
>>>>>>>>>>>> include/uapi/linux/genetlink.h | 1 +
>>>>>>>>>>>> net/netlink/genetlink.c | 7 +-
>>>>>>>>>>>> 12 files changed, 1257 insertions(+), 2 deletions(-)
>>>>>>>>>>>> create mode 100644 Documentation/filesystems/events.txt
>>>>>>>>>>>> create mode 100644 fs/events/Makefile
>>>>>>>>>>>> create mode 100644 fs/events/fs_event.c
>>>>>>>>>>>> create mode 100644 fs/events/fs_event.h
>>>>>>>>>>>> create mode 100644 fs/events/fs_event_netlink.c
>>>>>>>>>>>> create mode 100644 include/linux/fs_event.h
>>>>>>>>>>>> create mode 100644 include/uapi/linux/fs_event.h
>>>>>>>>>>>
>>>>>>>>>>> Any reason why you just don't do uevents for the block devices today,
>>>>>>>>>>> and not create a new type of netlink message and userspace tool required
>>>>>>>>>>> to read these?
>>>>>>>>>>
>>>>>>>>>> The idea here is to have support for filesystems with no backing device as well.
>>>>>>>>>> Parsing the message with libnl is really simple and requires few lines of code
>>>>>>>>>> (sample application has been presented in the initial version of this RFC)
>>>>>>>>>
>>>>>>>>> I'm not saying it's not "simple" to parse, just that now you are doing
>>>>>>>>> something that requires a different tool. If you have a block device,
>>>>>>>>> you should be able to emit uevents for it, you don't need a backing
>>>>>>>>> device, we handle virtual filesystems in /sys/block/ just fine :)
>>>>>>>>>
>>>>>>>>> People already have tools that listen to libudev for system monitoring
>>>>>>>>> and management, why require them to hook up to yet-another-library? And
>>>>>>>>> what is going to provide the ability for multiple userspace tools to
>>>>>>>>> listen to these netlink messages in case you have more than one program
>>>>>>>>> that wants to watch for these things (i.e. multiple desktop filesystem
>>>>>>>>> monitoring tools, system-health checkers, etc.)?
>>>>>>>> As much as I understand your concerns I'm not convinced uevent interface
>>>>>>>> is a good fit. There are filesystems that don't have underlying block
>>>>>>>> device - think of e.g. tmpfs or filesystems working directly on top of
>>>>>>>> flash devices. These still want to send notification to userspace (one of
>>>>>>>> primary motivation for this interfaces was so that tmpfs can notify about
>>>>>>>> something). And creating some fake nodes in /sys/block for tmpfs and
>>>>>>>> similar filesystems seems like doing more harm than good to me...
>>>>>>>
>>>>>>> If these are "fake" block devices, what's going to be present in the
>>>>>>> block major/minor fields of the netlink message? For some reason I
>>>>>>> thought it was a required field, and because of that, I thought we had a
>>>>>>> "real" filesystem somewhere to refer to, otherwise how would userspace
>>>>>>> know what filesystem was creating these events?
>>>>>>>
>>>>>>> What am I missing here?
>>>>>>>
>>>>>>> confused,
>>>>>>>
>>>>>>> greg k-h
>>>>>>>
>>>>>>
>>>>>> For those 'fake' block devs, upon mount, get_anon_bdev will assign
>>>>>> the major:minor numbers. Userspace might get those through stat.
>>>>>
>>>>> How can userspace do the mapping backwards from this "anonymous"
>>>>> major:minor number for these types of filesystems in such a way that
>>>>> they can "know" how to report the block device that is causing the
>>>>> event?
>>>>>
>>>>> thanks,
>>>>>
>>>>> greg k-h
>>>>>
>>>>
>>>> It needs to be done internally by the app but is doable.
>>>> The app knows what it is watching, so it can maintain the mappings.
>>>> So prior to activating the notifications it can call 'stat' on the mount point.
>>>> Stat struct gives the 'st_dev' which is the device id. Same will be reported
>>>> within the message payload (through major:minor numbers). So having this,
>>>> the app is able to get any other information it needs.
>>>> Note that the events refer to the file system as a whole and they may not
>>>> necessarily have anything to do with the actual block device.
>>
>> How are you going to show an event for a filesystem that is made up of
>> multiple block devices?
>>
>>> Or you can use /proc/self/mountinfo for the mapping. There you can see
>>> device numbers, real device names if applicable and mountpoints. This has
>>> the advantage that it works even if filesystem mountpoints change.
>>
>> Ok, then that brings up my next question, how does this handle
>> namespaces? What namespace is the event being sent in? block devices
>> aren't namespaced, but the mount points are, is that going to cause
>> problems?
>>
>> thanks,
>>
>> greg k-h
>>
>
> Getting back to the namespaces ...
> In the current state the notifications will be sent to the init network namespace,
> which means that processes belonging to a different net namespace will not
> be able to receive them. To be more precise, those processes will not be
> able to subscribe to the multicast group, though this can be easily changed.
> Furthermore, the notifications might also be sent to specific namespace.
> In this case, the one, with which the trace for the mount point has been registered,
> which as I believe would be the best approach.
>
> As for the mount namespaces, reading the config file needs to be slightly tweaked,
> to hide away all the registered mount points which does not belong to the current
> mount namespace.
>
> Still, there is one possible 'issue' - the private/slave mount points.
> As the notifications will be sent to all the listeners (within the same netns),
> the events might be visible to processes outside the given mount ns.
> This should be limited to only those listeners that share the mount namespace,
> to which such private/slave mount points belong. As using the generic netlink
> to filter the outgoing messages is doable (with small changes to current
> implementation), the filters themselves seem rather cumbersome, as they would require
> finding the socket’s owner mount namespace, which just doesn't seems right.
> On the other hand, identifying the file system, which generated the event, will
> not be possible for processes outside such namespace, as device major:minor
> numbers are not bound to any namespace (afaict) so they will not provide any
> valid information. They will remain unresolved.
>
> The best way out here though, is to leave it to userspace to properly setup new namespaces:
> the mount namespace with possible private/slave mounts should have a separate
> network namespace to isolate the potential fs events, if required.
>
>
> BR
> Beata
>
>
>
I'm not really sure where we are with this RFC now (?).
Just wanted to let You know I won't be available for the next two weeks,
in case this comes around.
Best Regards
Beata
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists