linux-kernel - Re: [PATCH 06/21] VFS: Introduce a superblock configuration context [ver #3]

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOssrKehzdSABr-K6Usb7AjxyJ-POtUbdRap0e8Jq78=o-cMLg@mail.gmail.com>
Date:   Thu, 18 May 2017 10:09:59 +0200
From:   Miklos Szeredi <mszeredi@...hat.com>
To:     David Howells <dhowells@...hat.com>
Cc:     viro <viro@...iv.linux.org.uk>, Jeff Layton <jlayton@...hat.com>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        linux-nfs@...r.kernel.org, lkml <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 06/21] VFS: Introduce a superblock configuration context
 [ver #3]

On Wed, May 17, 2017 at 1:31 PM, David Howells <dhowells@...hat.com> wrote:
> Miklos Szeredi <mszeredi@...hat.com> wrote:
>
>> > (b) is internal-only at the moment, used by NFS submounts as triggered by
>> > automounts.  There isn't currently any way to supply mount options to this.
>>
>> And all blockdev based fs.
>
> I see what you're getting at.  In which case there are more cases:
>
>   (a) new mount, new sb struct with no source (eg. procfs, sysfs, tmpfs)
>   (b) new mount, new sb struct, params loaded from filesystem data (eg. bdev)
>   (c) new mount, new sb struct, params derived from parent (eg. NFS automount)
>   (d) new mount, shared extant sb struct
>   (e) remount
>
> In the case of (d) where we're attempting to make another mount for an extant
> super_block struct and we need to check the consistency of the parameters.

Yes.  Current behavior seems to just ignore given options (except
MS_RDONLY) in that case, so we need to keep that possibility.

Also I think it would be good to allow selecting when superblock is created:

  - non-exclusive create: if exists return it, if not create it
  - exclusive create: only create if non-existent
  - non-create: only return if exists

>
>> > Ah - but some of these options have to be set *inside* sget() or before the
>> > superblock becomes live, even the ones that can be changed in-flight.
>>
>> That would be the "???" category.  Any concrete examples?
>
> NFS is a good example.  You need parameters that indicate the server to talk
> to and specify I/O parameters before you even get the superblock as you have
> to talk to the server first.  I think this is particularly true of NFSv2/3
> where you need to talk to mountd.
>
> This would also be true of AFS.  There you have to access the network to look
> up the volume ID before you can call sget() as the volume ID is part of the
> index key to the set of super_block structs.
>
> Further, some of these values (I/O parameters in NFS's case, for example) form
> part of the super_block struct index key, so you have to set those inside
> sget()'s set callback.

So what I propose is:

 1) call ->parse_option()

      would get indication what we are trying to do (find and/or
create and/or reconfig)

      this step is optional, the the filesystem type could possibly be
enough for the following steps

 2) call ->get_tree()

      pass sc containing parsed options and flags controlling the
creation of the superblock (create/exclusive)

      this step is optional, not called if we are given an sb to work
with (i.e. only reconfig)

 3) call ->reconfig()

      pass sc containing parsed options

      this step is optional, we might be instructed just to find or
create the sb

>
>> >> Also I think silently ignoring options is not always the right answer.
>> >
>> > Example?
>>
>> mount /dev/sda -oacl /mnt
>> mount /dev/sda -onoacl /mnt2
>
> So you'd like to give an error or a warning if ACLs are not supported, either
> by the filesystem or the kernel as a whole?

What I was getting at is that the second mount will ignore the "noacl"
option.  It's not something we apparently care much about (but will
definitely want to keep as back-compat thing for the mount(2)
interface).  But for the new interface I think we need something less
crazy.  One solution would be the exclusive create, which doesn't have
this problem. Maybe that's enough; not sure if we need anything more
sophisticated.

>> You are thinking on the wrong level.  Of course mount(2) needs to
>> handle MS_NOSUID et al.  But it's doing it now, and it isn't parsing
>> "nosuid", just translating MS_NOSUID to MNT_NOSUID.
>
> Ummm...  That's done by the parser in this case, so effectively it is.

Where exactly?  You are not touching do_mount(), which is where the
MS_*** -> MNT_*** translation is done.


>> For the fsopen() case you won't need to parse "nosuid" because that's a flag
>> for fsmount().
>
> Whilst this is true, that means that the parser has to operate differently in
> the mount(2) and fsopen(2) cases - which I was trying to avoid.

I don't get it.  We never passed MNT_* options as strings to the
kernel.  That was parsed by mount(8) and translated to MS_* flags.
So how would mount(2) and fsopen(2) need to operate differently
regarding parsing MNT_* options, when we want neither to do it?

>> The only thing fsmount() should take from the sc is the root_dentry.
>> It should be equivalent to what currently is a bind mount, except it
>> should be able to fully configure the new mount.
>
> It needs to take the device name as well.  I wonder if it would be possible to
> store the device name on the superblock and then leave a path-in-mount in the

Ah, mnt_devname.  The device name as just a special type of option and
as such should be stored in the superblock.

> vfsmount struct to fabricate a <source>:/<path> later.  Though this would
> change the behaviour if someone did:
>
>         mknod /dev/foo b 8 1
>         mknod /dev/bar b 8 1
>         mount /dev/foo /mnt/foo
>         mount /dev/bar /mnt/bar
>
> as /proc/mounts would now show /dev/foo for /mnt/bar.

I'd very much hope this doesn't introduce regressions, but if it did,
then we'd have to go back to using mnt_devname...

> Also, I guess the subtype should be wangled in the superblock-getting code
> (vfs_get_tree() as of patch 21) rather than in do_new_mount_sc().  If I do
> that, then it may be that do_new_mount_sc() only needs the root dentry pointer
> and not the sb_config pointer (except for error string passing).

Would be nice.

And hopefully error string passing can be made generic and be moved
out of this set.

Thanks,
Miklos