Nick Howell

Acceptance testing Puppet modules with PDK, Litmus and Vagrant

2020-05-26T00:00:00+00:00

Not long ago Puppet released the Puppet Development Kit (PDK) which was designed to simplify the process of creating Puppet modules to a consistent standard. Until I wrote the folding@home module, I had never written a module for the forge and I was a little concerned I wouldn’t get the coding standards correct. This post is a summary of how I used PDK, as well a Litmus and Vagrant to write and test a module.

PDK can be used to create a barebones module which sets up the correct directory structure, and templates some unit tests for you.

You get the commands pdk validate to perform basic parsing of all the relevant files, and pdk test unit to run your unit tests.

Puppet have some excellent documentation available here that includes how to install PDK, how to create a module using it and how to test, so I won’t go over that.

However, documentation on using Litmus for acceptance testing, and in particular using it on Windows where you don’t have docker to hand isn’t as forthcoming so this is my attempt to fill in a few of the blanks that I had.

Getting started

njhowell/puppet-pdk-example contains an example puppet module created using PDK. I’ve created a default class that just creates a temp file for the purposes of this.

Running pdk validate will parse the various files in the module, and pdk test unit will run the unit tests. The automatically generated unit test simply checks that the module compiles.

Acceptance tests

PDK includes puppet_litmus in the Rakefile that it generates, but there’s still a bit more config to do to get started.

Full details are on the Litmus wiki but if you’ve started with PDK, then you need to do the following:

Add some extra code to your .fixtures.yml file

---
 fixtures:
     repositories:
     facts: 'https://github.com/puppetlabs/puppetlabs-facts.git'
     puppet_agent: 'https://github.com/puppetlabs/puppetlabs-puppet_agent.git'
     provision: 'https://github.com/puppetlabs/provision.git'

Add or update code in spec/spec_helper_acceptance.rb

 # frozen_string_literal: true

 require 'puppet_litmus'
 require 'spec_helper_acceptance_local' if File.file?(File.join(File.dirname(__FILE__), 'spec_helper_acceptance_local.rb'))
 include PuppetLitmus

 PuppetLitmus.configure!

Next up, I created provision.yaml in the root of my module to create a provision list for litmus to use. This effectively lets you define lists of VMs to start to run your acceptance tests against.

For this example, I create a list called ‘vagrant’ and have it create a Ubuntu 18.04, Ubuntu 20.04 and Debian 9 VM using the virtualbox provider.

---
  vagrant:
    provisioner: vagrant
    images: ['generic/ubuntu1804', 'generic/ubuntu2004', 'generic/debian9']
    params:
      vagrant_provider: virtualbox

You can replace the image names with any image from VagrantCloud. Just make sure the image supports the provider you’re using (virtualbox in this case).

Next, lets see if it works.

Run pdk bundle install to install the gems listed in your Gemfile (this should be auto generated, so no need to modify it).

Then, you can start your VMs. Run pdk bundle exec rake litmus:provision_list[vagrant]. After a short while, you should see it using the vagrant provisioner to create your VMs.

We’re only part way there though. We still need to install the puppet agent, fix up the PATH environment variable (special step needed for vagrant images), install our module, run the acceptance tests, and then we can destroy the VMs. Before we get to any of that though, we should write some acceptance tests.

In the spec folder, create another subfolder called acceptance and inside that create a file named for your class. For example, in this case it’ll be example_spec.rb. A very simple acceptance test might look like this:

require 'spec_helper_acceptance'

pp_basic = <<-PUPPETCODE
  class {'example':

  }
PUPPETCODE

idempotent_apply(pp_basic)

Use the pp_basic variable to write some puppet to apply your module in some way. This class is very simple, but a more complex one may include parameter values for example.

This is also a very simple test - all is does is check that the manifest is applied in an idempotent way. You’ll want to add more tests to confirm that it’s actually creating the resources you expect.

With that done, we can put it all together:

pdk bundle exec rake litmus:install_agent will install the puppet agent on each VM you provisioned
pdk bundle exec bolt task run provision::fix_secure_path --modulepath spec/fixtures/modules -i inventory.yaml -t ssh_nodes This calls a bolt task directly and references the inventory.yaml file that litmus generates in the provision stage.
pdk bundle exec rake litmus:install_module installs the module we’re testing
pdk bundle exec rake litmus:acceptance:parallel runs our acceptance tests.

If all went to plan you should see that the tests finished with no failures. At this point you can either tear down the VMs, or make changes to your module, install it again, and run your tests some more.

Tear down the VMs with pdk bundle exec rake litmus:tear_down.

Summary

Litmus offers a nice framework for running acceptance tests, and it seems to be where the Puppet community is moving. Most examples use docker, which is great if you’re developing on Linux, or have docker on windows configured. Unfortunately, I used VMware Workstation and Virtualbox, which prevents me from also having Hyper-V (and thus docker) running on my system.

Automating Ubuntu 20.04 images with Packer

2020-05-01T00:00:00+00:00

Ubuntu 20.04 was released last week, so I set about creating a new image for our internal virtualisation platform. This post is about how we use Packer to automate the creation of images and what we had to do to get it to build Ubuntu 20.04.

Packer is a tool from Hashicorp that automates the building of machine images. Natively it supports a huge range of virtualisation options, but for our purpose, we use Virtualbox and VMWare Workstation. Our Virtualbox images are used by developers using Vagrant on their local systems, and our VMWare images are used for both Vagrant and our internal Openstack platform (which we use VMWare vCenter / ESXi for the compute resources).

The Packer Getting Started guide gives a good overview of how to use it. In a nutshell, a Packer configuration consists of an array of builders and, optionally, an array of provisioners. Builders define how to launch a VM on a particular platform, while the provisioners define what scripts to run on that image to prepare it in the way you want. Once those scripts have been run, Packer will shut down the VM and export it in some way depending on the builder. For example, that may be an AMI for EC2 or a file such as a vmdk for VMWare images. You can have multiple builders in a single Packer configuration, which means you can effectively build an identical image for multiple platforms very easily.

Building Ubuntu 20.04

I’m going to talk about a few of the things we do to provision our images, but it’s worth noting that there are the chef-maintaned bento boxes you can look at for full examples.

To build our Ubuntu 20.04 image, we start from scratch using the ISO. While it is possible to start from another image, we prefer this method because it gives us total control over what goes in the image.

You can see a full example of the Packer file we use here. The most interesting part of this step is configuring the boot_command. This is a command that is typed at the install prompt when the ISO boots. Ours looks like this:

"boot_command": [
    " ",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "",
    "/casper/vmlinuz ",
    "initrd=/casper/initrd ",
    "autoinstall ",
    "ds=nocloud-net;s=http://{{.HTTPIP}}:{{.HTTPPort}}/ubuntu-20.04/ ",
    ""
    ],

There are a few interesting points to notice here. The first two lines get us from the splash screen to the custom boot command entry on the installer. Then, the large number of are shortcode for backspace – we’re deleting all the prefilled boot commands so we can type our own.

Note that even though each of these entries is a new item in the array in our config file, they all get entered as a single line. It’s just broken up like this to make it easier to read.

We’re using the new AutoInstall method for Ubuntu 20.04. Previous versions use debian-installer preseeding, but that method didn’t immediately work with the new ISO.

Packer will start a small HTTP server when the build is run and substitute the {{.HTTPIP}} and {{.HTTPPort}} variables with the corresponding IP and Port. You must also set the http_directory configuration option to specify which directory on your filesystem hosts the files you want the HTTP server to serve. We have a directory called ubuntu-20.04 within that directory, and that in turn contains a user-data file which contains our AutoInstall config. I also found that AutoInstall expects a file called meta-data to be present, although it doesn’t require any content so I simply have an empty file called meta-data alongside user-data.

Our user-data file looks like this

#cloud-config
autoinstall:
  version: 1
  apt:
    geoip: true
    preserve_sources_list: false
    primary:
    - arches: [amd64, i386]
      uri: http://gb.archive.ubuntu.com/ubuntu
    - arches: [default]
      uri: http://ports.ubuntu.com/ubuntu-ports
  identity:
    hostname: ubuntu2004
    username: vagrant
    password: 
  ssh:
    allow-pw: true
    install-server: true
  locale: en_US
  keyboard: 
    layout: gb
  storage:
    layout:
      name: direct
    config:
      - type: disk
        id: disk0
        match:
          size: largest
      - type: partition
        id: boot-partition
        device: disk0
        size: 500M
      - type: partition
        id: root-partition
        device: disk0
        size: -1
  late-commands:
    - "echo 'Defaults:vagrant !requiretty' > /target/etc/sudoers.d/vagrant"
    - "echo 'vagrant ALL=(ALL) NOPASSWD: ALL' >> /target/etc/sudoers.d/vagrant"
    - "chmod 440 /target/etc/sudoers.d/vagrant"

Note that the vagrant bits are somewhat unique to us. We create a user called vagrant as part of the install. That’s so we can use this image as a vagrant box later on. Note also at the end that we add vagrant to the sudo config and ensure that it doesn’t require a password to run sudo commands. This ensures that when the image is used in vagrant, it doesn’t prompt for a password before running a command with root privileges.

Then we come to the provisioners. For this base image, we run two scripts, one to update all the packages, and the other cleans up a few things:

sudo apt-get update
sudo apt upgrade -y
sudo apt install apt-transport-https -y

and

sudo apt-get clean
FILE=/etc/cloud/cloud.cfg.d/50-curtin-networking.cfg
if test -f "$FILE"; then
  sudo rm $FILE
fi

FILE=/etc/cloud/cloud.cfg.d/curtin-preserve-sources.cfg
if test -f "$FILE"; then
  sudo rm $FILE
fi

FILE=/etc/cloud/cloud.cfg.d/subiquity-disable-cloudinit-networking.cfg
if test -f "$FILE"; then
  sudo rm $FILE
fi

The interesting thing in the second script is the removal of the *.cfg files from /etc/cloud/cloud.cfg.d/. Those files get created by AutoInstall and include config that prevents cloud-init from correctly running a second time. Probably not a problem in most cases, but our VMDK images are destined for Openstack which uses cloud-init to configure the instances on boot.

Finally, we repeat this style of config many times for different versions of Ubuntu and CentOS, but also Windows desktop and Windows Server editions. In most cases, we build a base image from an ISO, as above, and then build more specialised images using those base images as starting points. Some of our more complicated configurations also use Puppet as a provisioner to install things such as SQL Server, Oracle or Visual Studio to allow our development teams to easily test against those platforms.

Folding@Redgate

2020-04-02T00:00:00+00:00

It’s fair to say that coronavirus has had a huge impact on our day-to-day lives. Many organisations have really stepped up and are putting huge amounts of effort into helping the world deal with it.

One small thing I saw several references to, though, was the Folding@Home project. It’s something I had contributed to before, but had largely forgotten was a thing. It’s main purpose is to simulate protein folding with the goal of using that information to allow medical researchers to develop vaccines and other treatments for various illnesses.

Unsurprisingly, they started providing work units for Covid-19 research and calling on people to donate computing power.

That gave me an idea. At Redgate we have a lot of spare computing capacity in our hypervisor clusters, so I figured it was worth spinning up a few VMs to run the folding client. While this isn’t going to make a huge impact right away, hopefully it’ll go someway to helping the longer term cause of finding a vaccine.

As many of our systems as possible are built and configured using Puppet, so I took the opportunity to write a puppet module to install and configure the client.

The puppet module is fairly simple. It only works on Debian based systems for the moment though.

As of today, Redgate now has 5x 8Core VMs running, each with 2x 4CPU work slots on them. We’re also considering installing the client on our TeamCity agents to utilise the spare compute capacity there, although we obviously need to be careful not to disrupt our production workloads ;)

If you want to see how we’re getting on, you can checkout our team stats page here.

Updating my video encoding server

2018-02-09T00:00:00+00:00

A while back I started a process to convert VHS tapes to digital videos and part of that involved transcoding the mpg files that were captured to slightly more reasonable h264 mp4 files. Mostly because I wanted to learn, I completely over-engineered the solution and instead of just using Handbrake, I set about create an encoding farm involving a NAS, a RabbitMQ message bus, and a couple docker containers to actually do the encoding work.

This worked pretty well, but there were still a few flaws that I mentioned in my post at the time.

This week, I had a bit of time to tackle this problems and see if I could improve the system. Turns out I could.

I stumbled across a python project called Celery which promised to deal with the task queue element of the system. One of the problems I had was that if an ffmpeg process died while encoding, then the message would still get ack’d even though the process failed. Celery solves that by only ack’ing the message once the process completes successfully.

Another nice feature I noticed was that you can query the state of workers. Running celery inspect active in the working directory of the application on any of the worker nodes gives a list of the workers currently up and the message they’re processing.

I made a couple other changes too. Instead of this being a docker application, I moved it to a simple python app that runs in a VM. Docker didn’t make sense for this, and it was far simpler to just have an Ubuntu Server VM configured by puppet.

Puppet ensures all dependencies are installed, checks out the code and ensures the worker process is running using Supervsiord. Supervisor can then be used to stop the worker process if needed, and the nice thing about Celery is that if you do that before an encode completes, the message gets requeued and another worker will deal with it instead.

The whole project is in Github here. There are two main components:

queueFiles.py deals with getting the messages onto the queue. It doesn’t really matter how this is done, only that it is.
videoTasks.py contains the encode function that actually does the processing. Here you’ll find the ffmpeg command that gets executed. It’s pretty basic, but I find it covers most scenarios with good quality output.

Using Grafana with PRTG

2018-02-09T00:00:00+00:00

I’ve been using PRTG at work for a long time now, and recently started using the free tier at home for my own projects.

It works really well, but one of the things I found quite lacking was it’s ability to create dashboards. Sure, you have maps but there’s only so much you can do there, and they just don’t look all that good. Then I found this blog post on the PRTG blog about someone having written a PRTG data source provider for Grafana. Grafana is excellent a creating dashboards and drawing graphs, and what’s more, the end result looks good.

I set about giving it a try, and the results are pretty good so far.

I did find the installation documentation a bit lacking to start with, so here are my findings…

The install guide is on Github here along with the Grafana plugin.

The first stumbling block for me, was that you have to use the passhash and not password for your API user. To be fair, it does say that in the configuration wizard, I just couldn’t read. You can get the passhash from the My Account section once you’re logged into PRTG.
Once that’s done, creating a dashboard couldn’t be easier.
- Create a new dashboard
- Add a panel of the type you want. Graph most likely.
- Edit that panel, and in the Metrics section select PRTG as your datasource and then each of the Group, Host, Sensor and Channel fields will autopopulate with what’s in your PRTG installation.
- Just select what you want from the list(s) and sit back and admire your graph.
Remember to save your dashboard after each change. This doesn’t happened automatically, and if you refresh the page you’ll loose your changes.
If you add new things to PRTG, then you might need to refresh the Grafana page completely before those things show up in Grafana.
If you want a ‘status table’ showing you active Alarms in PRTG, for example, then you need to use the Table panel type in Grafana. However, you should use raw query mode and reference the PRTG api directly.
- Example, URI: table.json, Query String: content=sensors&columns=device,sensor,status,message,downtimesince&filter_status=4&filter_status=5&filter_status=10. This will return all messages for Warning, Error and Unusual states. The filtering is done based on the filter_status options, the documentation for which I found here
- If you dive into the Column Styles to color code the rows, I found I had to delete all existing rules, and create new ones before it would behave correctly.
In the PRTG KB there is some more examples of getting started with the Grafana PRTG plugin. In many ways I found that documentation more useful than what was in the Github repo for the plugin.

Finally, here are a couple screenshots of dashboards I created.

Making a home dashboard

2017-08-04T00:00:00+00:00

During my time off work I started on a project I’ve been wanting to do for a while now. I wanted to make a home dashboard that displayed, at least, our shared google calendar. If it could show other information as well, then so much the better.

I was spurred in to action after spotting a couple posts on Intructables of people doing similar things here and here. I’d also seen countless examples of people creating smart mirrors using similar techniques.

I starting investigating a bit further about how I could embed a google calendar into a web page, when I came across Dakboard which has a bunch of integrations and gives you a nice dashboard at the end of it.

Software

First things first, I set up Dakboard how I wanted it. I intergrated it with my Google Calendar, my Google Photos library, our shared todo list on Wunderlist, and got it to pull in weather from Yahoo.

Configuring the Raspberry Pi was realtively simple as well.

First off, we need to rotate the display.

Set display_rotate = 1 in /boot/config.txt. I also had to disable overscan by setting disable_overscan = 1 as well.

Next, install chromium by running apt-get install chromium-browser and finally update the autostart file to launch chromium on boot into kiosk mode.

Edit /home/pi/.config/lxsession/LXDE-pi/autostart to include

@chromium-browser --noerrdialogs --incognito --kiosk

and replace with your Dakboard URL.

Note that the location of that file seems to change between versions of raspbian, this is where it was on the latest version as of August 2017.

You may also want to install unclutter which will hide the mouse pointer for you by running apt-get install unclutter. You’ll then need to add the following to the autostart file:

@unclutter -display :0 -noevents -grab

If you haven’t already, you may also need to set Raspbian to auto login which you can do in the raspi-config program.

One last thing I added was automatically switching the display off. I didn’t want it lighting up my hall all night, so I created two cron jobs to switch the display on and off.

To turn the display off, run /usr/sbin/tvservice -o and back on again /usr/sbin/tvservice -p. This puts the display into standby and should work on most displays connected via HDMI. DVI and VGA connections may not work as well (or even at all).

I put those commands into the crontab file for the root user, and had the screen turn on at 8am and off at 10pm.

* 8 * * * /usr/bin/tvservice -p
* 22 * * * /usr/bin/tvservice -o

To test that it all works, reboot your system and it should start up, auto login, and display your webpage.

Hardware

There are two main parts to the hardware – the display and the frame.

For the display, I had some spare Raspberry Pi’s and an old monitor lying around. I put together a quick proof of concept, and quickly found that although it worked and the dashboard looked great, the old Pi 1’s were not powerful enough to render the webpage in a timely fashion. So I ordered a brand new Raspberry Pi 3, which is much better.

Next I ripped apart the display to extract it from the ugly plastic casing to see what I had to work with to make the frame.

One of the problems I immeidately found was that when you take into account the power supply and control board, this was already quite thick at around 6cm. But….I also needed to get power to the Raspberry Pi and the display. My initial thought was to include a 4-way power strip behind the display, however that quickly increased the depth to about 10cm, which I didn’t like.

I spotted in one of the posts above that they had taken 5v from the monitor power supply board - I figured there must be a 5v rail on mine somewhere so some more dismantling was required. I was preparing myself for spending a good chunk of time trying to identify a 5v rail, but when i got the plastic casing off I was greeted with this:

…I was very happy.

The soldering iron came out, an old micro USB cable was sacrifced, and not long after I had a Raspberry Pi powered from the monitors power supply.

Next up, was the frame. This was probably the most complicated part of the project. After reading the two posts I had a pretty good idea how I’d do it. I had to make a few adjustments to the methods in those posts though to account for my lack of tools. Specicially no access to a router or table saw, meant I wouldn’t be cutting any grooves and sliding the display in. Mine would also be thicker than in those examples on account of the power supply and control board on the back of the monitor. I chose to keep the original metal casings for those for simplicity, but it did mean my screen was about 6cm deep.

I chose to make the frame in a similar way to the second post above – that is, a deep section that sat around the display, and then a front section that covered the edges up a bit. Those two would be glued together and the display would fit in from the back.

Originally I was going to have nice mitre cuts for the corners, however it turns out I also lack the nescessary tools and/or skill to make accurate 45 degree cuts, so I changed my mind and instead had the edge peices butt up against each other at 90 degrees. It doens’t look as nice, but better than having gaps I think.

On the back section, I added a few cross peices to hold the power supply in place. The back section and the cross peices were screwed together, and the front piece was glued on. The cross peices would come out later in order to install the display.

Several hours later the glue had dried enough to assemble it all and see what it looked like.

On one of the cross braces I screwed a couple of stand-offs in, and then mounted the Raspberry Pi on those. It just fitted…more by luck than judgement I think.

Now, the moment of truth. Would it work. I powered it up and waited patiently for it to boot.

It worked!

All that’s left now, is to paint it and attach it to the wall.

Update 2017-08-05:

I’ve painted the frame, and hung it on the wall now. Here are a few more photos of it in action. You may also spot my method of hanging it – some wire between two loops on the inside of either edge. That then hooks on to two giant picture hooks on the wall.

Digitising VHS Tapes

2017-01-18T00:00:00+00:00

As I’m sure many people do, we have a huge stack of old VHS tapes and a VHS player that I really don’t want sat under my TV any more. On the other hand, I do have a Roku sitting there, and Plex Media Server set up on my LAN so I formed a cunning plan: I would digitise all our VHS tapes and store the resulting files in Plex so we could watch them on any TV in the house whenever we wanted.

Fortunately, I had an old PC in the loft that had a Hauppage WinTV card that took a composite video input. Unfortunately, the WinTV application was the only way I could record the video in Windows, and that only worked on Windows 7. I dug out an old Windows 7 installer and got to work. Shortly thereafter I had a working WinTV install, which is where I discovered my next problem. The only SCART -> Composite adapters I had were all video in, rather than video out and I couldn’t find my switchable one. Thankfully the internet has pin outs for SCART connectors, and with the appropiate application of a soldering iron I soon had a frankendapter that would do both video in and out.

After about half a day of faffing around I had a system that could capture video. The only downside was that a 2 hour video produced a 5GB mpg file. That file needed transcoding to something more sensible like x264 in an mp4 container. This is where I got a bit carried away.

Instead of doing what any sane person would and just using Handbrake, I thought it’d be much better to create myself a little encoding server. Except it was more like an encoding farm…or at least has the capability of being such a thing.

At a high level, it consists of:

A file share on my NAS
A message bus (RabbitMQ on an Ubuntu VM)
One or more encode servers running as Docker images.

The basic process flow is this:

The file to be encoded is put in a special directory in the file share
A script on the RabbitMQ box runs every 5 minutes to check for files. When it finds one it moves it to another directory and adds a message to the bus with the files pathname.
Inside the Docker instance(s) a script is continuously polling the message bus for new messages. When it gets one it takes it off the queue (but doesn’t ack, yet) and begins encoding it using ffmpeg.
When the encode completes the file is put in another directory and the message from the queue is ack’d.

This is far more complicated than I required, but it was fun to do. There’s a few nice features about this:

If a Docker instance dies, then the message eventually times out and goes back on the queue to be consumed again.
If there are a lot of files to encode, then you can start additional instances of the Docker image on other machines and encode multiple files in parallel.

However, there is also a few problems:

If the encode fails, or the ffmpeg process dies, the message still gets ack’d. Some error checking would be nice but I can’t really be bothered with that for this.
There is not currently any way to change the encode parameters on a per video basis.
It’s not possible to specify an output file name or location.

If I took this much further I think I’d end up with a very small scale version of Amazons Elastic Transcoder, which on the face of it appears to work in a similar way…. perhaps that’s where I got the idea from.

Did it work

So…how has the plan gone? Well….I’ve got one box full of VHS tapes that have been digitised, and about another 10 tapes to go, which is pretty good I think. The process actually works quite well, except for the part where I have to cue up and start/stop the recording of the tapes manually….unfortunately I don’t think there’s really a way around that.

Raspberry Pi NAS With Gluster

2016-07-23T00:00:00+00:00

I was having a moan about my NAS (a Western Digital My Cloud) the other day and thought that I could build a better one using a Raspberry Pi.

Now, using a Raspberry PI as a DIY NAS is nothing new. You only need to google for ‘Raspberry Pi NAS’ to see what I mean. I wanted something a bit different. I wanted something scalable and redundant. I’d heard about Gluster before and knew roughly what it could do, but never really played with it. This is a perfect opportunity then. Sure there are several posts around the internet of people doing the exact same thing, but I wanted to give it a go anyway.

This post is going to be a brief guide on what I set up and how you can replicate it. I’m not going to go into huge detail on any of the technologies I used, there’s plenty of resources that already do that.

This is what I ended up with:

2 Raspberry PI 1 Model B’s
2 8GB USB Sticks
GlusterFS with a single replicated volume
Samba for Windows File shares
VRRP to present a single IP and some automatic failover

Preparation

To start with you’ll need two Raspberry Pi’s. By the way, this will work on any Debian based operating system. I’m using Raspbian Wheezy on a Raspberry Pi 1, but it’ll work just as well on Ubuntu on an x86 system. Also, I know Wheezy is quite old now, but it’s the only one that’ll easily fit on the 4GB SD cards I had to hand.

On each node you need to create a file system for Gluster to use. I used XFS on the USB sticks.

Install xfsprogs:

apt-get install xfsprogs

My USB disks appears as /dev/sda, so to format them to XFS:

mkfs.xfs -i size=512 /dev/sda1

Make a directory to mount this on:

mkdir -p /data/brick1

And finally, make sure this gets mounted at boot by adding the following to /etc/fstab:

/dev/sda1 /data/brick1 xfs defaults 1 2

Finally, mount it:

mount -a

Now, you may find you get an error at this point. I did, but I think that’s because I had updated the kernel just before and hadn’t rebooted. If it fails with an error like unknown filesystem xfs, then reboot the node and try again.

You can check if the volume is mounted by looking at the output of mount:

/dev/root on / type ext4 (rw,noatime,data=ordered)
devtmpfs on /dev type devtmpfs (rw,relatime,size=218416k,nr_inodes=54604,mode=755)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=44540k,mode=755)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /run/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=89060k)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
/dev/mmcblk0p1 on /boot type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,errors=remount-ro)
/dev/sda1 on /data/brick1 type xfs (rw,relatime,attr2,inode64,noquota)

As you can see, at the bottom there’s a line with /dev/sda1 on /data/brick1 type xfs.

Gluster

Now that the nodes are ready, we can install gluster

apt-get install glusterfs-server

That’ll take a bit of time, but the service should start automatically. Then you need to probe each node from the other one to register them in Gluster.

From node1:

gluster peer probe node2

From node2:

gluster peer probe node1

In each case you should see something like this:

root@node1:/home/pi# gluster peer probe node2
Probe successful

Next, we need to make the Gluster volume.

On each node:

mkdir /data/brick1/gv0

and then on one of the nodes:

gluster volume create gv0 replica 2 node1:/data/brick1/gv0 node2:/data/brick1/gv0

followed by:

gluster volume start gv0

You should see success messages which means it’s time to test your new Gluster volume.

On one of the nodes, make a directory and mount the gluster volume onto it:

mkdir -p /mnt/gv0
mount -t glusterfs node1:/gv0

The should succeed without errors, then you can create a file in /mnt/gv0:

touch /mnt/gv0/testfile

and it should appear on both nodes in /data/brick1/gv0:

root@node1:~# ls /data/brick1/gv0/
testfile

root@node2:~# ls /data/brick1/gv0/
testfile

That’s Gluster done.

Next up, we can install Samba on our nodes to present a Windows file share.

I’m choosing to put Samba on the Gluster Nodes and share the mounted volume /mnt/gv0. There are plenty of other ways to do this, and I suspect a ‘best practice’ would probably be to have one or two additional machines to present the file shares and the leave the Gluster nodes to just do Gluster. But, I only have two PIs spare at the moment…

First things first, we need to mount the Gluster volume on both nodes. First, make a directory for it on both nodes:

mkdir -p /mnt/gv0

Then add the following to /etc/fstab on node1:

node1:/gv0 /mnt/gv0 glusterfs defaults 0 0

and add this to /etc/fstab on node2:

node2:/gv0 /mnt/gv0 glusterfs defaults 0 0

Now, on both nodes install samba:

apt-get install samba

and edit the config file at /etc/samba/smb.conf.

In the global section you’ll want:

security = user
guest account = nobody

and then a share section that looks like this:

[gluster]
    guest ok = yes
    path = /mnt/gv0
    read only = no

Next, make sure that /mnt/gv0 is writeable by Samba on both nodes. I opted for the lazy approach:

chmod 777 /mnt/gv0

and finally, restart the Samba service to make the config active:

/etc/init.d/samba restart

You can of course adjust all those settings to own desires. That will give you an anonymous writeable share. You may want more security than that.

Testing

Time for a quick bit of testing.

With Samba installed and configured you should be able to browse to \\node1\ and \\node2\ from a Windows machine on your network and see a folder share called gluster. You should also be able to write to that share in both instances.

I did a quick bit of performance testing and found that a 100Mb file copied at about 20mbit/sec. That is not quick by any stretch of the imagination, but as you can see from the screenshot below, my poor Raspberry PI’s CPU was working flat-out.

I did another test where I accessed a folder directly in Samba instead of using Gluster and saw an improvement up to 65mbit/sec. Again, that maxed out the CPU, but at least Samba could use it all instead of sharing with Gluster.

It would be interesting to see what the performance would be like using a Raspberry Pi 3… I have some spare ones at work at the moment…perhaps I’ll borrow them and update this post with the results…

VRRP

Finally, our little project isn’t complete without some automatic failover.

At the moment, to access the samba share we have to point to one of the Gluster nodes directly. If that node went offline, then we’d have to manually switch to using the other one. Lets fix that. Enter, VRRP. Or more specifically in this case, keepalived.

VRRP is a protocol that allows two or more devices to share a single IP. It’s only active on one node at a time, but if that should fail another immediately brings that IP up. Keepalived is an application that implements this protocol.

Install it on both nodes:

apt-get install keepalived

I’m going to use 192.168.1.80 as my Virtual IP, but you should use any IP in your subnet that isn’t used part of your DHCP range.

Next, create the /etc/keepalived/keepalived.conf config file and one the primary node enter this:

vrrp_instance VI_1 {
        state MASTER
        interface eth0
        virtual_router_id 51
        priority 150
        advert_int 1
        authentication {
                auth_type PASS
                auth_pass somerandompassword
        }
        virtual_ipaddress {
                192.168.1.80
        }
}

Then, on the second node, enter this:

vrrp_instance VI_1 {
        state MASTER
        interface eth0
        virtual_router_id 51
        priority 100
        advert_int 1
        authentication {
                auth_type PASS
                auth_pass somerandompassword
        }
        virtual_ipaddress {
                192.168.1.80
        }
}

Both files are very similar.

priority should be lower on the slave node
virtual_router_id can be anything, but must be the same one both nodes
auth_pass should be some secure password and be identical on both nodes
interface should refer to your network inferface, eth0 in my case.

Next, start keepalived:

/etc/init.d/keepalived start

and then give it a try by browsing to \\192.168.1.80\ (or whatever IP you used) from a Windows machine. You should see your gluster share and your files. At this point you’ll probably want to assign a DNS address to that IP if you have the capability to do so.

Final thoughts

This seems like an excellent way to get yourself a DIY NAS which features both redundancy and effectively unlimited scaling capacity. Gluster can do more than just replication – you can stripe files across nodes which means you could have 4 nodes, with a replica count of 2 to ensure redundancy. Need a bit more space? Just add another node, or another USB hard drive to an existing one and create a new brick. I also notice that Gluster can do geo-replication. I haven’t looked into it, but that could present an opportunity to make an asynchronous offsite copy of your data.

On a Raspberry Pi 1 it’s not very fast, but if you don’t care about access time then it’s a perfect use for them. Pi 3’s will probably be much faster, and at about £30 they’re pretty cheap too. You might not get all the value add features that a company like Western Digital will give you, but if you don’t care about that, then for the same amount of money you’ll get a much better NAS.

Sources

I used a few different sources when researching this:

The Gluster Quick Start Guide is an excellent peice of documentation. Simple and to the point, with options to dive off into the more in depth Gluster documentation if you want to.
The Anonymous Share on the Samba Wiki was a very handy page for quickly reminding me of the options I needed to set to get a simple share up and running in Samba.
Finally, this guide from raymii.org was very helpful for getting keepalived set up quickly.

My Home Network Has Got a Little Out of Hand

2016-07-16T00:00:00+00:00

I was sat my computer the other night and something occurred to me. My home network has got a little out of hand.

I don’t mean that in the sense that is it untidy, or that there are cables trailing all around the house. Instead, just the sheer number of devices I have connected and doing things.

So, I decided to write it down to see exactly how out of hand it really was. Plus…I’m a sysadmin and documenting your stuff is good practice, right?

Here we go then. This is what I have in my network:

1 Router
1 8 port Gigabit switch
1 NAS
3 Raspberry Pi’s
1 Mini ITX PC
2 Games Consoles
1 ESXi Server
5 VMs
2 Desktop PCs
2 Laptops
2 Rokus
2 Android Tablets
2 Smart Phones
1 Printer
2 SONOS Play:1 Speakers

The router is a standard unit supplied by my ISP. It gives me WiFi and NAT. A single connection is made from that to the switch which handles all other switching. DNS and DHCP are handled by the Mini ITX PC which runs dhcpd and BIND on Ubuntu.

The NAS is a Western Digital My Cloud which, to be honest, is a bit crap. I’ve had to enable SSH and disable most of the “value add” features they put on just to get a vaguely usable SMB share.

The Raspberry Pis each perform a different role; 1 acts as a Print Server, 1 has an XRF module from the wireless inventors kit and acts as a sensor gateway for a light and temperature sensor in my living room. The third Pi sits up in the attic with my model railway where it controls a point motor via some relays. Eventually when I get back to it I’ll add more point motors and relays and control the entire layout.

Next, the VMs. There’s a few of these. One of them runs EmonCMS which stores and present the sensor data collected by the Pi above. There’s also a Plex Media Server and a puppet master. Finally, there’s a VM for running docker containers. Those are for a little project I have on the go for digitising VHS tapes. The tapes get recorded using a TV Capture card in one of the PCs to mpg. That file is stored temporarily by another VM and added to a RabbitMQ queue. Container instances then take a file from the queue and begin transcoding it to mp4 for long term storage.

The remaining items are fairly straight forward; An Xbox One, Wii and Roku 2 make up the TV unit in the living room. While a Roku Streaming stick and one of the SONOS Play:1’s live in the bedroom. The other SONOS is in the kitchen.

If you’ll excuse the terrible diagram and handwriting, this is what it looks like if you draw it out.

All in all, that’s not a small number of devices for two people… and yet, I would happily add more. In most cases I don’t add these things because I need to. I do it because I enjoy it. I’m a sysadmin by day and I enjoy tinkering in my spare time. Yes, my home network is a little out of hand….but y’know what? On balance, I think I like it that way.

Hyper-V VM loses network connectivity

2016-07-12T00:00:00+00:00

I have an interesting problem at the moment where a selection of our VMs will stop passing network traffic. It doesn’t seem to be a flat out disconnect, rather it behaves more like the virtual switch is just dropping packets instead of forwarding them on like a good switch should.

So, what actually happens? I have two Hyper-V clusters, each running on Windows Server 2012 R2. One is a 5 node cluster, the other only has 2 nodes. Every now and then (usually once or twice a week) a VM will suddenly drop off the network. We’ll take a look to find the VM up and running but showing that the network has no Internet access – the little yellow triangle of doom.

Further investigation at this point reveals that:

You cannot ping any other IP from the VM. Including the host and other VMs on the same virtual switch.
Traceroute is similarly unhelpful and confirms that you can’t get out of the VM.
Pings and traces going the other way also fail.
Using Wireshark on the VM shows only ARP requests leaving the VM, but using Wireshark on the host at the same time you see no such requests.
Using Wireshark you see no packets coming back into the VM.

I have been speaking to Microsoft to try to resolve this issue with no luck so far. We did attempt to start a netsh trace on both VM and host, which was unsuccessful. The netsh command just hangs in the VM and never completes.

This happens to VMs on both clusters and is not confined to a single host. My initial speculation was that it was an issue with the virtual switch and installing a recent batch of Windows Updates was to blame. However, rolling those back had no effect.

Microsoft’s initial thought was that it was something on the VM that was at fault. I wasn’t sure that could be the case given it happened on multiple VMs, seemingly at random, but some more recent developments (such as the issue running netsh) are making me reconsider.

Right now Microsoft think it could be a filter driver in the VM, in particular the one from our ESET AntiVirus software. It’s certainly plausible, so we have removed it for now and we’ll see what happens over the next week.

I’ll post updates here with how we get on…

2016-07-23 Update: It’s been a little over a week now, and since removing ESET AV from the server we have had no more network failures on that particular VM. I’m keeping an eye on it, but I suspect this will prove to be the culprit.