lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALOAHbBUu-oa_wb-PCBdn+vs1k1ZddGhVJg2UuVx912wGWoLkQ@mail.gmail.com>
Date:   Thu, 7 Dec 2023 20:36:56 +0800
From:   Yafang Shao <laoar.shao@...il.com>
To:     Greg KH <gregkh@...uxfoundation.org>
Cc:     jejb@...ux.ibm.com, martin.petersen@...cle.com, rafael@...nel.org,
        linux-kernel@...r.kernel.org, linux-scsi@...r.kernel.org
Subject: Re: [PATCH] drivers: base: Introduce a new kernel parameter driver_sync_probe=

On Thu, Dec 7, 2023 at 8:12 PM Greg KH <gregkh@...uxfoundation.org> wrote:
>
> On Thu, Dec 07, 2023 at 07:59:03PM +0800, Yafang Shao wrote:
> > On Thu, Dec 7, 2023 at 6:19 PM Greg KH <gregkh@...uxfoundation.org> wrote:
> > >
> > > On Wed, Dec 06, 2023 at 10:08:40PM +0800, Yafang Shao wrote:
> > > > On Wed, Dec 6, 2023 at 9:31 PM Greg KH <gregkh@...uxfoundation.org> wrote:
> > > > >
> > > > > On Wed, Dec 06, 2023 at 11:53:55AM +0000, Yafang Shao wrote:
> > > > > > After upgrading our kernel from version 4.19 to 6.1, certain regressions
> > > > > > occurred due to the driver's asynchronous probe behavior. Specifically,
> > > > > > the SCSI driver transitioned to an asynchronous probe by default, resulting
> > > > > > in a non-fixed root disk behavior. In the prior 4.19 kernel, the root disk
> > > > > > was consistently identified as /dev/sda. However, with kernel 6.1, the root
> > > > > > disk can be any of /dev/sdX, leading to issues for applications reliant on
> > > > > > /dev/sda, notably impacting monitoring systems monitoring the root disk.
> > > > >
> > > > > Device names are never guaranteed to be stable, ALWAYS use a persistant
> > > > > names like a filesystem label or other ways.  Look at /dev/disk/ for the
> > > > > needed ways to do this properly.
> > > >
> > > > The root disk is typically identified as /dev/sda or /dev/vda, right?
> > >
> > > Depends on your system.  It can also be identified, in the proper way,
> > > as /dev/disk/by-uuid/eef0abc1-4039-4c3f-a123-81fc99999993 if you want
> > > (note, fake uuid, use your own disk uuid please.)
> > >
> > > Why not do that?  That's the most stable and recommended way of doing
> > > things.
> >
> > Adapting to this change isn't straightforward, especially for a large
> > fleet of servers. Our monitoring system needs to accommodate and
> > adjust accordingly.
>
> Agreed, that can be rough.  But as this is an issue that was caused by a
> scsi core change, perhaps the scsi developers can describe why it's ok.
>
> But really, device naming has ALWAYS been known to not be
> deterministic, which is why Pat and I did all the driver core work 20+
> years ago so that you have the ability to properly name your devices in
> a way that is deterministic.  Using the kernel name like sda is NOT
> using that functionality, so while it has been nice to see that it has
> been stable for you for a while, you are playing with fire here and will
> get burned one day when the firmware in your devices decide to change
> response times.

I agree that using UUID is a better approach. However, it's worth
noting that the widely used IO monitoring tool 'iostat' faces
challenges when working with UUIDs. This indicates that there's a
significant amount of work ahead of us in this aspect.


>
> > > > While reverting to synchronous probing could ensure
> > > > stability, it's worth noting that asynchronous probing can potentially
> > > > shorten the reboot duration under specific conditions. Thus, there
> > > > might be some resistance to reverting this change as it offers
> > > > performance benefits in certain scenarios. That's why I prefer to
> > > > introduce a kernel parameter for it.
> > >
> > > I don't want to add a new parameter that we need to support for forever
> > > and add to the complexity of the system unless it is REALLY needed.
> >
> > BTW, since there's already a 'driver_async_probe=', introducing
> > another 'driver_sync_probe=' wouldn't significantly increase the
> > maintenance overhead.
>
> Any new code adds maintenance overhead and complexity, so you have to
> justify it's existance especially when you are not going to be the one
> maintaining it :)

Understood.


-- 
Regards
Yafang

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ