starred/the-bastion

Fork 0

mirror of https://github.com/ovh/the-bastion.git synced 2026-05-09 16:35:33 +02:00

[GH-ISSUE #584] Prevent The Bastion from running if /home isn't mounted yet #153

New issue

Closed

opened 2026-05-07 00:19:26 +02:00 by BreizhHardware · 3 comments

BreizhHardware commented

2026-05-07 00:19:26 +02:00

Owner

Originally created by @jon4hz on GitHub (Sep 11, 2025).
Original GitHub issue: https://github.com/ovh/the-bastion/issues/584

Hi,

To ensure that The Bastion has always consistent data, I would like to prevent connection to it, if the /home partition isn't mounted yet.

For example, if I reboot one of my Bastion slave nodes, I only want to allow connections to it, once the /home partition was (manually) decrypted and mounted.

Is there a way to achieve this?

Originally created by @jon4hz on GitHub (Sep 11, 2025). Original GitHub issue: https://github.com/ovh/the-bastion/issues/584 Hi, To ensure that The Bastion has always consistent data, I would like to prevent connection to it, if the `/home` partition isn't mounted yet. For example, if I reboot one of my Bastion slave nodes, I only want to allow connections to it, once the `/home` partition was (manually) decrypted and mounted. Is there a way to achieve this?

BreizhHardware closed this issue

2026-05-07 00:19:26 +02:00

BreizhHardware commented

2026-05-07 00:19:27 +02:00

Author

Owner

@speed47 commented on GitHub (Sep 12, 2025):

Hello,

Interestingly, this is a discussion we've been having a few times internally (including this very week!), because when it happens that a node reboots, the "Permission denied (publickey)" message is a bit confusing to the users.

The thing is that, as the authentication is completely handed off to OpenSSH, we have limited options to tell the users that this node is currently in maintenance, and there's no way, for users, to notice a difference between "your couldn't authenticate because your key is invalid" and "you couldn't authenticate because this node is currently sealed".

There are a few options, though:

A) Have two distinct OpenSSH server daemon instances, one for bastion users (port 22, using the default sshd_config shipped by the bastion), and one for bastion admins (port 222, with a distinct sshd_config only allowing root to connect). When a node reboots, the port 22 service would stay down (not auto-started at boot) until an admin connects to port 222, unlocks /home, and starts the port 22 SSH daemon. This way, port 22 would be closed until the service is actually up and running.

B) Modify the /etc/ssh/banner so that it clearly states that this node is currently sealed, and that another node of the cluster should be used. This could be done by having an /etc/ssh/banner.sealed file stating this, and a /etc/ssh/banner.ok file containing the usual default banner. Then a systemd unit file running once at boot could create a symlink /etc/ssh/banner => banner.sealed, so that the users can tell that this node is sealed, as the banner is always shown before starting the authentication phase. Then, when an admin unseals the node, they can move the symlink to /etc/ssh/banner => banner.ok.

@speed47 commented on GitHub (Sep 12, 2025): Hello, Interestingly, this is a discussion we've been having a few times internally (including this very week!), because when it happens that a node reboots, the "Permission denied (publickey)" message is a bit confusing to the users. The thing is that, as the authentication is completely handed off to OpenSSH, we have limited options to tell the users that this node is currently in maintenance, and there's no way, for users, to notice a difference between "your couldn't authenticate because your key is invalid" and "you couldn't authenticate because this node is currently sealed". There are a few options, though: A) Have two distinct OpenSSH server daemon instances, one for bastion users (port 22, using the default `sshd_config` shipped by the bastion), and one for bastion admins (port 222, with a distinct `sshd_config` only allowing `root` to connect). When a node reboots, the port 22 service would stay down (not auto-started at boot) until an admin connects to port 222, unlocks `/home`, and starts the port 22 SSH daemon. This way, port 22 would be closed until the service is actually up and running. B) Modify the `/etc/ssh/banner` so that it clearly states that this node is currently sealed, and that another node of the cluster should be used. This could be done by having an `/etc/ssh/banner.sealed` file stating this, and a `/etc/ssh/banner.ok` file containing the usual default banner. Then a systemd unit file running once at boot could create a symlink `/etc/ssh/banner => banner.sealed`, so that the users can tell that this node is sealed, as the banner is always shown before starting the authentication phase. Then, when an admin unseals the node, they can move the symlink to `/etc/ssh/banner => banner.ok`.

BreizhHardware commented

2026-05-07 00:19:27 +02:00

Author

Owner

@jon4hz commented on GitHub (Sep 13, 2025):

Thanks a lot for your answer.

I think both of those approaches have their pros and cons.

A) Would be a bit more complex, and you'd have to maintain a second sshd_config, but it would also bring a great benefit in HA environment. If you have a loadbalancer or geo dns, you could set up probes to only send traffic to The Bastion if the node is actually available.

B) This wouldn't prevent The Bastion from actually receiving traffic, but it would be much simpler to implement.

What do you think about implementing approach B for now? I think the systemd unit to create symlink at boot could be created by the setup-encryption.sh script, and the unlock-home.sh script could then adjust the banner once the node is unsealed.

I'd be happy to submit a PR for that.

@jon4hz commented on GitHub (Sep 13, 2025): Thanks a lot for your answer. I think both of those approaches have their pros and cons. A) Would be a bit more complex, and you'd have to maintain a second `sshd_config`, but it would also bring a great benefit in HA environment. If you have a loadbalancer or geo dns, you could set up probes to only send traffic to The Bastion if the node is actually available. B) This wouldn't prevent The Bastion from actually receiving traffic, but it would be much simpler to implement. What do you think about implementing approach B for now? I think the systemd unit to create symlink at boot could be created by the `setup-encryption.sh` script, and the `unlock-home.sh` script could then adjust the banner once the node is unsealed. I'd be happy to submit a PR for that.

BreizhHardware commented

2026-05-07 00:19:27 +02:00

Author

Owner

@speed47 commented on GitHub (Sep 13, 2025):

A) Would be a bit more complex, and you'd have to maintain a second sshd_config, but it would also bring a great benefit in HA environment. If you have a loadbalancer or geo dns, you could set up probes to only send traffic to The Bastion if the node is actually available.

Yes exactly. Geodns would work out of the box, or even a simple DNS RR where faulty nodes are automatically removed and the TTL kept to a low value. But to have a loadbalancer would require OpenSSH to support proxy-protocol, so that we get the real client IP, which is paramount for traceability and ensuring the from="" part of the pubkeys are applied. Unfortunately OpenSSH doesn't support it, and I don't really expect it to, even in the future. There is a workaround though: a setup using something similar to https://github.com/cloudflare/mmproxy should work.

B) This wouldn't prevent The Bastion from actually receiving traffic, but it would be much simpler to implement.

What do you think about implementing approach B for now? I think the systemd unit to create symlink at boot could be created by the setup-encryption.sh script, and the unlock-home.sh script could then adjust the banner once the node is unsealed.

This is actually the solution we will probably use internally!
The systemd unit could preexist, distributed in etc/systemd as the other shipped unit files, but be disabled by default. Just dropping a new service file in this folder will do exactly that, as per:

github.com/ovh/the-bastion@c8b86b718a/bin/admin/install (L1118-L1127)

The setup-encryption.sh script could then automatically enable it however!

It would be great to also craft a good old init.d file, as you'll find in etc/init.d, for distros not using systemd... (do those still exist?).

github.com/ovh/the-bastion@c8b86b718a/bin/admin/install (L1075-L1079)

The install script automatically detects whether systemd is present or not, and installs the proper files.

I'd be happy to submit a PR for that.

That would be great!

@speed47 commented on GitHub (Sep 13, 2025): > A) Would be a bit more complex, and you'd have to maintain a second `sshd_config`, but it would also bring a great benefit in HA environment. If you have a loadbalancer or geo dns, you could set up probes to only send traffic to The Bastion if the node is actually available. Yes exactly. Geodns would work out of the box, or even a simple DNS RR where faulty nodes are automatically removed and the TTL kept to a low value. But to have a loadbalancer would require OpenSSH to support proxy-protocol, so that we get the real client IP, which is paramount for traceability and ensuring the `from=""` part of the pubkeys are applied. Unfortunately OpenSSH doesn't support it, and I don't really expect it to, even in the future. There is a workaround though: a setup using something similar to https://github.com/cloudflare/mmproxy should work. > B) This wouldn't prevent The Bastion from actually receiving traffic, but it would be much simpler to implement. > > What do you think about implementing approach B for now? I think the systemd unit to create symlink at boot could be created by the `setup-encryption.sh` script, and the `unlock-home.sh` script could then adjust the banner once the node is unsealed. This is actually the solution we will probably use internally! The systemd unit could preexist, distributed in `etc/systemd` as the other shipped unit files, but be disabled by default. Just dropping a new service file in this folder will do exactly that, as per: https://github.com/ovh/the-bastion/blob/c8b86b718acb43c0aa22def62f43cc74ca010433/bin/admin/install#L1118-L1127 The `setup-encryption.sh` script could then automatically enable it however! It would be great to also craft a good old `init.d` file, as you'll find in `etc/init.d`, for distros not using systemd... (do those still exist?). https://github.com/ovh/the-bastion/blob/c8b86b718acb43c0aa22def62f43cc74ca010433/bin/admin/install#L1075-L1079 The install script automatically detects whether systemd is present or not, and installs the proper files. > I'd be happy to submit a PR for that. That would be great!

BreizhHardware referenced this issue

2026-05-07 00:20:10 +02:00

[PR #153] [MERGED] fix: add a case to the ignored perl panic race condition #282