HPC Developer Cloud
We're using the openstack developer clouds to test cluster deployments, but since that's a new product, it has its own teething issues. Here are a few tricks to make it work for HPC.
Creating a "hopbox" Instance
Since we only have one public IP per lab, we need a machine we can SSH into and then hop to others. These are the "Hopboxes" and they can be any flavour of Linux.
A stable Debian is recommended, as you won't have to change it or re-install many packages, and you need a stable and secure setup.
The hopbox machine should be a tiny instance (one core, 512MB RAM), have SSHD running and nothing else. Default Debian SSHD config forbids password login, so that's solved.
To SSH into other machines, you'll need two things:
- Add the machine names and their IPs in
/etc/hosts
, so that once you ssh into the hopbox, you can ssh into the machines via their names. - Add as SSH-hop configuration to your local
.ssh/config
, so that you can directly ssh into the machines, via the hopbox.
The SSH config should look something like:
Host hopbox
User <your-linaro-username>
HostName <the-public-IP>
Host ohpc-* *.cloud
User <your-linaro-username>
StrictHostKeyChecking no // Only here, not in the hopbox
ProxyCommand ssh hopbox nc -q0 %h %p 2>/dev/null
The hopbox should be pretty stable and not change the SSHD keys, so if you get a warning that it changed, something is wrong. The safest way is to login the cloud interface and restore the instance to a known snapshot.
But the internal instances can change all the time, so it's simpler to have it not checking the host keys.
The local machines can be called whatever you want, but make sure to match the pattern with wildcards (*). The two common ways is to have a prefix (ex. ohpc-*) or a suffix (ex. *.cloud), or both. These names are the ones that you have to put on your /etc/hosts
file.
Associating an external IP
You can associate the external IP to the hopbox by clicking on "Associate Floating IP". You must choose the public IP and the "port" (which is just the local IP of the hopbox).
The UI is not stable enough, so the "port" won't show for a while. Wait a few minutes and reload the page. It may show up later, or it may timeout.
If it does time out, there is a way to do that with the client, but that requires having your Linaro password as an environment variable and it's not recommended. Contact the admins and they can help you.
After the IP is associated, you should make sure you can SSH into the hopbox via that IP. If you have changed the config as recommended above, this should "just work (tm)":
$ ssh debian@hopbox
You'll then have to create your Linaro user using your LDAP ID:
$ sudo useradd -m -u <LDAP-ID> <Linaro Username>
And copy the SSH key:
$ mkdir -m 700 /home/<Linaro User>/.ssh
$ cp ~/.ssh/authorized_keys /home/<Linaro User>/.ssh
$ chmod 600 /home/<Linaro User>/.ssh/authorized_keys
You can add any number of users, for all the people that will be able to access the lab and crate masters.
If the user you are creating is an administrator, make sure to allow sudo via "vigr
" and adding it to the sudo
group.
If all worked correctly, you should now log off and in again and just:
$ ssh hopbox
That should log you in as your Linaro user.
TODO: Make that setup LDAP aware via admin/user groups.
Creating an Internal Network
For OpenHPC, the usual setup is to have the master with two NICs, one on the external interface and one on the internal, and all slaves (compute nodes) with a single NIC on the internal network.
The master is then setup with DHCP, TFTP and NAT, and the slaves PXE boot on the internal network, where the master will provision it correctly.
To that have setup, we need an internal network, without DHCP, where all slaves' NICs will be attached to, as well as the second NIC on the master.
On the "Network Topology" tab, you can see the external network, your main gateway and your main network. Your hopbox will be connected to the main network.
Before you create masters and slaves, you need to create the internal network. You should have one internal network for every master, so be sure to name it accordingly.
Click on "+ Create Network" and fill in the fields. They should be:
- Network
- Name: Similar to your master's name, so that it's easy to match them.
- Create Subnet: checked
- Subnet
- Subnet Name: can be the same name as the network, doesn't matter
- Network address: can be anything different from your main network, ex. 172.22.16.0/24. It's good to keep it different from the other internal networks.
- Gateway IP: is the IP of the master on the internal network, usually x.x.x.1
- Disable Gateway: checked. This should avoid the network associating a non-1 IP to the master's internal NIC
- Subnet details
- Enable DHCP: disabled
- Everything else empty
Creating a Master Instance
On the "Instances" tab, click on "Launch Instance" and fill in the fields. They should be:
- Details
- Name: a personalised name, with the SSH pattern above (ex. ohpc-master-yourname)
- Source
- Choose the CentOS 7 image (centos7-cloud) [1]
- Flavour
- You'll need at least 10GB of disk and 2 cores
- Networks
- Be careful, here's the trick: First move the internal network, then the main one. [2]
[1] The master instance needs to be either CentOS 7 or SLES 12. For now, we only have CentOS images, so choose that one.
[2] The order matters, because QEMU has trouble adding the interfaces. The first NIC will connect to the second network and vice-versa, and only the first network has DHCP set.
Everything else is irrelevant, just click "Launch Instance".
The instance will be created but not started. In the "Instances" list you need to click on "Start Instance".
If all went according to plan, you should be able to ssh to it via the hopbox. But since the user is still the default (centos, debian), you'll need additional setup.
Repeat the user setup steps above for the hopbox in this machine and once it's over, the redirect should just work:
$ ssh ohpc-master-yourname
And you should have a master with two NICs.
You need to identify your DHCP NIC (either eth0 or eth1) and setup the other one as fixed address with the gateway IP of the network you created.
Creating Slave Instances
Creating slaves are much simpler. You still should choose CentOS, 10+GB and at least two cores, but you only need one NIC, directly on the internal network you created.
TODO:
- Find out the MAC address of the instances before boot.
- Configure those addresses in the OpenHPC master
- Setup slaves so they can PXE boot
- Setup an instance without a pre-defined image, but just an empty disk