New Lab Setup
This document is specific to the London RedCentric lab, but should evolve to a more generic setup once we have more labs. For now, there are some hard-coded logic in the wiki as well as the scripts, to make sure we can reproduce at least the one lab we have. Once we have more labs, we'll work to automate that using configuration files, command line options, etc.
London RedCentric
Our HPC Lab will be using the 10.40.0.0/16 network, using a VPN just for us. We will have no contact with any other lab, in or out.
The servers will receive static IP assignments in the 10.40.16.*/20 range, while the provisioner will work with IPs in the ranges:
- 10.40.16.*/22 to dynamic
- 10.40.20.*/22 to dynamic-reserved
- 10.40.24.*/22 to static
The second interface in the provisioner will be in a different sub-net (via VLAN) with fixed IPs (because MrP still can't DHCP) in the range 10.41.0.0/16. This is overly restrictive considering the ranges above, but it's enough for the London data-centre (we won't have more than 250 machines in there).
The masters and benchmark machines will be provisioned by MrProvisioner and the compute nodes will be provisioned by the master (ex. warewulf, xCAT, etc).
There will be a VLAN for each cluster, to allow internal communications without flooding the rest of the lab (including other clusters), and these will be GB, 10GBE or InfiniBand, in the ranges 192.168.2.0/24, 192.168.3.0/24, 192.168.4.0/24 respectively, as each cluster can have more than one interconnect technology at the same time.
Here's a diagram of the network:
Setting up the hpc-admin node
The hpc-admin node will be the physical server hosting the MrProvisioner and Jenkins services for the HPC lab.
The baremetal installation is : a Debian9 (stretch) hosting the two services using KVM/QEMU for the moment (migration to Docker/Containers will be possible when MrP support for containers is production ready.)
Required Packages and repos
Install Debian as you normally would for a server, do care to install the ssh server and to plan for enough space for the Jenkins logs (a bare minimum of 500GB for the Jenkins VM is desirable).
The first step is to checkout our private configuration repository (you'll need to be in the hpc-sig-admin LDAP group):
root@hpc-admin # apt update && apt install -y git root@hpc-admin # git clone ssh://git@dev-private-git.linaro.org/hpc/labconf.git root@hpc-admin # cd ~/labconf && git submodule update --init --recursive
Once the repos are checked out, update the system and install the required packages:
root@hpc-admin # cd ~/labconf/packages root@hpc-admin # ./install_packages.sh
You now have a working bare-metal server running Debian 9 with all the appropriate utilities and tools.
Network Configuration
For the VMs to work on the two network interfaces of the host, we need to create a bridge in each and assign the required static IPs, as well as enabling IP forward and creating the SSH keys and setting up Ansible's host file.
This is all done by the network_setup.sh
script in our labconf
repository:
root@hpc-admin # cd ~/labconf/network root@hpc-admin # ./network_setup.sh <IF0> <IF1>
Change IF0
to your primary interface (the one connected to the firewall / VPN and IF1
to the one that will be connected to the BMCs (via the MrP VM).
Warning: This script will restart your network. It has been tested remotely (via SSH), but you may want to have a physical terminal nearby just in case.
Warning: This script will set the /etc/ansible/hosts file to reflect the HPC Lab's IP layout. Please edit file to reflect your own topology.
Setting up the VMs
With the network in place, you can create the VMs.
root@hpc-admin # cd ~/labconf/kvm root@hpc-admin # ./jenkins_virt_install.sh root@hpc-admin # ./mrp_virt_install.sh root@hpc-admin # ./fileserver_virt_install.sh root@hpc-admin # ./login_virt_install.sh
For all, the preseed will setup statis IPs (10.40.0.11 and 10.40.0.12 and 10.40.0.13 respectively), and they should be visible from the wider network, including the host.
This is done to simplify VM migration and a potential new installation on a different server.
The network setup step above assumes the same IPs, so everything is fixed. In time we'll use configuration files so you don't have to change too many scripts.
The Login node still doesn't use LDAP (TODO), but accounts can be created by hand, for now.
Installing the MrP service
You need to run both KEA and MrP roles to install a fully working provisioner. This can be done via the infra-server playbook:
root@hpc-admin # cd ~/labconf/ans_setup_mrp/ root@hpc-admin # ./pre-setup.sh root@hpc-admin # ansible-playbook mrp_setup.yml -v -u root
Ansible will start MrProvisioner automatically, so you should be able to just open the URL on your browser (assuming you have a route to the machine's IP):
The default authentication is (admin:linaro), please change it as soon as possible.
ISSUES:
- Network setup for BMC network has conflicting static and dynamic ranges. This needs fixing.
- KEA is build on the machine every time. This is ridiculously slow but we need KEA 1.2 and Debian only has 1.1. We need to find/create a package.
Installing the Jenkins service
Run Ansible and wait until it exists with no errors:
root@hpc-admin # cd ~/labconf/ans_setup_jenkins root@hpc-admin # ansible-playbook configure-jenkins.yml -v -u root
Ansible will start Jenkins automatically, so you should be able to just open the URL on your browser (assuming you have a route to the machine's IP):
If your Linaro login belongs to the hpc-sig-admin group, then you can directly login, as Jenkins is connected to LDAP, with your email and Linaro password.
BE CAREFUL: Jenkins is not yet using SSL, so your password will be passed plain text. Only use this if you are inside a VPN or on an isolated network.
You may get two warnings when you log in to Jenkins, which can be corrected on the Global Security screen:
- ERROR in config.xml: Jenkins may complain "version 1.1" is not supported, only 1.0. Editing
/var/lib/jenkins/config.xml
and changing that on the first line seems to work. - Agent to master security subsystem is currently off: Go to Security Settings and check the box saying "Enable Agent → Master Access Control"
- Jenkins instance uses deprecated protocols: JNLP3-connect: Go to Security Settings > Agents and clear the box "Java Web Start Agent Protocol/3" in "Agent Protocols"
- SSH HOST KEY VERIFIERS ARE NOT CONFIGURED FOR ALL SSH SLAVES: They are (host key verification), but Jenkins wants you to mark that manually, by entering all slaves' configuration and hitting "Save".
Themes: Install the Simple Theme Plugin and choose one from the list by updating the theme URL in the general settings.
Save the configuration and you should be all set.
Installing the Jenkins Jobs
WARNING : The following playbook does not run on versions of ansible inferior to 2.4.0 since it makes use of the 'include_tasks' module.
Clone the repository
root@hpc-admin # git clone https://github.com/Linaro/hpc_lab_setup.git root@hpc-admin # cd hpc_lab_setup
Create the authorisation files
You need to find your API token in Jenkins. That's done by clicking on your username (top right corner) > Configure > API Token > Show API Token.
This will show your user ID and token.
hpc_lab_setup/vars/jenkins_cred.yml.secret:
user: user@linaro.org password: {TOKEN} url: http://10.40.0.12:8080
NOTE: The API TOKEN is the one from hpc-sig-admin users, not regular users.
Also, you need a token for Mr-Provisioner, to upload the preseeds. If you haven't got one yet, create it on the UI by clicking on your username's link (top right) > Tokens > "+". This will create a token.
Create a new file and copy the token hash into it.
hpc_lab_setup/vars/mrp_creds.yml.secret:
mr_provisioner_auth_token: {TOKEN}
Run the Jenkins playbook
The first playbook you need to run is Jenkins:
root@hpc-admin # ansible-playbook -v -u root jenkins.yml
This also works to update once there are changed. This playbook will create the nodes, jobs, users, ssh keys, etc.
The Jenkins playbook is a requirement for the other two: MrP and FS.
Warning: If you only want to install the services, and do not have (access, since it requires sudo) /etc/ansible/hosts configured, use this command :
To populate the hosts, please refer to the doc, and define a group named "jenkins".
root@hpc-admin # ansible-playbook -i hosts -v -u root jenkins.yml
Create Mr-Provisioner users account
You need to create the Jenkins account by hand in Mr-Provisioner, add the SSH keys and generate the token. This requirement will be dropped when bug 102 is fixed.
The Jenkins playbook will create an SSH key in hpc-jenkins' /var/lib/jenkins/.ssh/id_rsa.pub
. That's the one you should update in Mr-Provisioner's Jenkins account.
For now, the process is the following:
- Log in as "admin"
- Create a user "jenkins", set its password to some random string (use 'pwgen')
- Logout as "admin" - Log in as "jenkins"
- Add the SSH keys of all slaves to it
- Generate a APITOKEN, copy and paste somewhere
- Log out as "jenkins"
Add the APITOKEN generated by the step above, add it to vars/jslave_tokens.yml.secret
in the following format (same token for all users):
jslave_tokens: - jslave: d05ohpc token: APITOKEN - jslave: qdcohpc token: APITOKEN - jslave: d03bench token: APITOKEN - jslave: d05bench token: APITOKEN - jslave: qdcbench token: APITOKEN - jslave: tx2bench token: APITOKEN
Run Mr-Provisioner and File System playbooks
root@hpc-admin # ansible-playbook -v -u root mrp.yml root@hpc-admin # ansible-playbook -v -u root fs.yml
Warning: If you want to not depend on /etc/ansible/hosts, populate a hosts file with three groups : "jenkins", "provisioner", "fileserver" and use the command :
root@hpc-admin # ansible-playbook -v -u root -i hosts mrp.yml root@hpc-admin # ansible-playbook -v -u root -i hosts fs.yml
Updating Jenkins Jobs
Once the jobs are installed and working, on every change pertaining the Jenkins configuration, you just need to update the repo and run the same playbook again:
root@hpc-admin # cd hpc_lab_setup root@hpc-admin # git fetch -a & git pull root@hpc-admin # ansible-playbook -v -u root jobs.yml
Warning: If you want to not depend on /etc/ansible/hosts, populate a hosts file with this groups : "jenkins" and use the command :
root@hpc-admin # ansible-playbook -v -u root -i hosts jobs.yml