View on GitHub

emersonprado.github.io

Start page
Português

Efficient Vagrantfiles – Part 4 – Code and data integration

Emerson Prado - 02/08/2020

In the third part of this series, we saw the YAML file format and how it can store data. Now let's use it to finally make our environment easily readable and mantainable.

  1. TL;DR
  2. Ruby and YAML
    1. Ruby to YAML conversion
    2. YAML to Ruby conversion
  3. Our Vagrantfile
  4. More settings
  5. More files
  6. Where do we go from here?

TL;DR

If you already know how to convert back and forth between Ruby objects and YAML files, you might skip this part of the articles, and get going with what I've summarized below.

In part two, we ended with a hash with data for all VMs, then the code with a loop iterating from that hash. Let's use Ruby dump method to create an YAML file etc/vms.yaml with that hash:

  mkdir etc
  ruby -e "
    require 'yaml'
    puts(YAML.dump({
      'vm_1' => {
        :box => 'ARTACK/debian-jessie',
        :ip => '192.168.1.2'
      },
      'vm_2' => {
        :box => 'ARTACK/debian-jessie',
        :ip => '192.168.1.3'
      },
      'vm_3' => {
        :box => 'centos/7'
      }
    }))
  " > etc/vms.yaml
  

In my experiments, I needed some extra VirtualBox settings. So I added them in the YAML file generated above.

  cat etc/vms.yaml
  ---
  vm_1:
    :box: ARTACK/debian-jessie    # From the dump command above
    :ram: 256                     # Added manually
    :ip: 192.168.1.2              # From the dump command above
    :custom:                      # Added manually
      '--usb': 'off'
      '--usbehci': 'off'
      '--usbxhci': 'off'
      '--vrde': 'off'
  vm_2:
    :box: ARTACK/debian-jessie
    :ram: 256
    :ip: 192.168.1.3
    :custom:
      '--usb': 'off'
      '--usbehci': 'off'
      '--usbxhci': 'off'
      '--vrde': 'off'
  vm_3:
    :box: centos/7
    :cpu: 2
    :custom:
      '--usb': 'off'
      '--usbehci': 'off'
      '--usbxhci': 'off'
      '--vrde': 'off'
  

Then, in the Vagrantfile, let's load this file into a variable, which will have the same hash as before. Finally, the loop will use this variable, with the additional settings:

  require 'yaml'
  VMS = YAML.load_file('etc/vms.yaml')    # Load YAML file in variable

  Vagrant.configure("2") do |config|
    VMS.each do |name, confs|             # Loop from variable
      config.vm.define name do |vm|
        config.vbguest.auto_update = false
        vm.vm.box = confs[:box]
        vm.vm.network "private_network", ip: confs[:ip] if confs.has_key?(:ip)
        # Additional settings
        vm.vm.provider 'virtualbox' do |virtualbox|
          virtualbox.memory = confs[:ram] if confs.has_key?(:ram)
          virtualbox.cpus = confs[:cpu] if confs.has_key?(:cpu)
          confs[:custom].each do |custom, value|
            virtualbox.customize ['modifyvm', :id, custom, value]
          end if confs.has_key?(:custom)
        end
      end
    end
  end
  

You probably saw (kudos to you!) the VMs file has almost the same data for the VMs that use the same box. We can divide it in two files - VMs and boxes:

Boxes data file - etc/boxes.yaml

  ---
  debian_jessie:
    :name: ARTACK/debian-jessie
    :ram: 256
    :custom:
      '--usb': 'off'
      '--usbehci': 'off'
      '--usbxhci': 'off'
      '--vrde': 'off'
  centos_7:
    :name: centos/7
    :cpu: 2
    :custom:
      '--usb': 'off'
      '--usbehci': 'off'
      '--usbxhci': 'off'
      '--vrde': 'off'
  

VMs data file - etc/vms.yaml

  ---
  vm_1:
    :box: debian_jessie   # VM settings include which box to use
    :ip: 192.168.1.2
  vm_2:
    :box: debian_jessie
    :ip: 192.168.1.3
  vm_3:
    :box: centos_7
  

Then, the Vagrantfile

  require 'yaml'
  BOXES = YAML.load_file('etc/boxes.yaml')        # Boxes variable
  VMS = YAML.load_file('etc/vms.yaml')            # VMs variable

  Vagrant.configure("2") do |config|
    # Loop from VMs variable
    VMS.each do |name, confs|
      config.vm.define name do |vm|
        config.vbguest.auto_update = false
        # Box settings from boxes variable
        box = BOXES[confs[:box]]
        vm.vm.box = box[:name]
        vm.vm.network "private_network", ip: confs[:ip] if confs.has_key?(:ip)
        vm.vm.provider 'virtualbox' do |virtualbox|
          virtualbox.memory = box[:ram] if box.has_key?(:ram)
          virtualbox.cpus = box[:cpu] if box.has_key?(:cpu)
          box[:custom].each do |custom, value|
            virtualbox.customize ['modifyvm', :id, custom, value]
          end if box.has_key?(:custom)
        end
      end
    end
  end
  

Now, we have 3 small and well organized files, each one with only data or only code, and all of them quite easy to understand and maintain. That means job done!

Complexity is the limit. You can have a single data file, or many ones dividing the data. Keep in mind the compromise between conciseness and simplicity. Always remember everything should be easy to understand and mantain by other people, including non-developers. Use your good sense and keep up!

Ruby and YAML

In part 3, I said Ruby and YAML were close friends. In developer language, this means there's a specialized module which deals with YAML inside Ruby's standard library. In more real world words, it means you can use some YAML conversion methods (and more) without installing anything besides Ruby itself. You just have to load the module before using its functions, with require 'yaml'.

Ruby to YAML conversion

To convert Ruby objects into YAML, we use the dump method, which dumps an YAML string with the object passed.

  ruby -e '
    require "yaml"
    puts(YAML.dump([
      3,
      "Array",
      "Items"
    ]))
  '
  ---
  - 3
  - Array
  - Items

  ruby -e '
    require "yaml"
    puts(YAML.dump({
      "First"  => "Hash element",
      "Second" => "Hash element",
      "Third"  => "Hash element"
    }))
  '
  ---
  First: Hash element
  Second: Hash element
  Third: Hash element
  

YAML to Ruby conversion

To go the other way, from YAML to Ruby objects, we use the load method, which loads a provided YAML string in a Ruby object. I'll remove the indentation from the data sections because YAML doesn't allow free indenting (side note: YAML is picky).

  ruby -e '
    require "yaml"
    YAML.load("
  ---
  - 1
  - 2
  - 3
    ").each do |item|
      puts("Item: #{item}")
    end
  '
  Item: 1
  Item: 2
  Item: 3

  ruby -e '
    require "yaml"
    YAML.load("
  ---
  1: One
  2: Two
  3: Three
    ").each do |key, value|
      puts("Key: #{key} - Value: #{value}")
    end
  '
  Key: 1 - Value: One
  Key: 2 - Value: Two
  Key: 3 - Value: Three
  

Our Vagrantfile

Now our mission is to convert the huge hanging hash from part 2 to a nice data-only file, and leave our Vagrantfile with only code.

Sure the dump method can do the conversion. There's a difference though: we have to write the results to a file, not the screen. All we have to do is to redirect the output to a file. Let's call this file vms.yaml. For cleanness, let's create a directory for this config file. I can't think of a better name than etc.

  mkdir etc

  ruby -e "
    require 'yaml'
    puts(YAML.dump({
      'vm_1' => {
        :box => 'ARTACK/debian-jessie',
        :ip => '192.168.1.2'
      },
      'vm_2' => {
        :box => 'ARTACK/debian-jessie',
        :ip => '192.168.1.3'
      },
      'vm_3' => {
        :box => 'centos/7',
      }
    }))
  " > etc/vms.yaml

  cat etc/vms.yaml
  ---
  vm_1:
    :box: ARTACK/debian-jessie
    :ip: 192.168.1.2
  vm_2:
    :box: ARTACK/debian-jessie
    :ip: 192.168.1.3
  vm_3:
    :box: centos/7
  

Now, to load it the object in the Vagrantfile, it's the same, with the same difference: we have to load the string from the file, not type it. Easy: let's change the load to the load_file method.

  require 'yaml'
  VMS = YAML.load_file('etc/vms.yaml')    # Load YAML file in variable

  Vagrant.configure("2") do |config|
    VMS.each do |name, confs|             # Loop from variable
      config.vm.define name do |vm|
        config.vbguest.auto_update = false
        vm.vm.box = confs[:box]
        vm.vm.network "private_network", ip: confs[:ip] if confs.has_key?(:ip)
      end
    end
  end
  

More settings

Yes, sure a virtual machine is not about only name and IPs. Maybe you need to change RAM and/or CPU settings, for example:

  require 'yaml'
  VMS = YAML.load_file('etc/vms.yaml')

  Vagrant.configure("2") do |config|
    VMS.each do |name, confs|
      config.vm.define name do |vm|
        config.vbguest.auto_update = false
        vm.vm.box = confs[:box]
        vm.vm.network "private_network", ip: confs[:ip] if confs.has_key?(:ip)
        vm.vm.provider 'virtualbox' do |virtualbox|
          virtualbox.memory = confs[:ram] if confs.has_key?(:ram)
          virtualbox.cpus = confs[:cpu] if confs.has_key?(:cpu)
        end
      end
    end
  end
  

Then, in the data file:

  ---
  vm_1:
    :box: ARTACK/debian-jessie
    :ram: 256
    :ip: 192.168.1.2
  vm_2:
    :box: ARTACK/debian-jessie
    :ram: 256
    :ip: 192.168.1.3
  vm_3:
    :box: centos/7
    :cpu: 2
  

In my case, I needed some VirtualBox custom settings. In Vagrant, this is done by passing a setting name and a value into virtualbox.customize tag.

  require 'yaml'
  VMS = YAML.load_file('etc/vms.yaml')

  Vagrant.configure("2") do |config|
    VMS.each do |name, confs|
      config.vm.define name do |vm|
        config.vbguest.auto_update = false
        vm.vm.box = confs[:box]
        vm.vm.network "private_network", ip: confs[:ip] if confs.has_key?(:ip)
        vm.vm.provider 'virtualbox' do |virtualbox|
          virtualbox.memory = confs[:ram] if confs.has_key?(:ram)
          virtualbox.cpus = confs[:cpu] if confs.has_key?(:cpu)
          confs[:custom].each do |custom, value|
            virtualbox.customize ['modifyvm', :id, custom, value]
          end if confs.has_key?(:custom)
        end
      end
    end
  end
  

That was the code, still independent from the settings themselves (no data in code!). Now, in the datafile, let's disable USB and VirtualBox Remote Desktop Extension, which already gave me pain in some VMs:

For the custom settings, we need names and values, which means a hash as the value for the ':custom' key
  ---
  vm_1:
    :box: ARTACK/debian-jessie
    :ram: 256
    :ip: 192.168.1.2
    :custom:
      '--usb': 'off'
      '--usbehci': 'off'
      '--usbxhci': 'off'
      '--vrde': 'off'
  vm_2:
    :box: ARTACK/debian-jessie
    :ram: 256
    :ip: 192.168.1.3
    :custom:
      '--usb': 'off'
      '--usbehci': 'off'
      '--usbxhci': 'off'
      '--vrde': 'off'
  vm_3:
    :box: centos/7
    :cpu: 2
    :custom:
      '--usb': 'off'
      '--usbehci': 'off'
      '--usbxhci': 'off'
      '--vrde': 'off'
  

More files

I heard you say "Hey, there's too much repetition in the data files!". That makes me proud of you.

Yes, there are settings that will repeat. All VMs from the same box use at least the same box name and version. Sometimes, same resources and/or custom settings. Why repeat them all if we can have a data file for the boxes themselves?

Boxes data file

  ---
  debian_jessie:
    :name: ARTACK/debian-jessie
    :ram: 256
    :custom:
      '--usb': 'off'
      '--usbehci': 'off'
      '--usbxhci': 'off'
      '--vrde': 'off'
  centos_7:
    :name: centos/7
    :cpu: 2
    :custom:
      '--usb': 'off'
      '--usbehci': 'off'
      '--usbxhci': 'off'
      '--vrde': 'off'
  

VMs data file

  ---
  vm_1:
    :box: debian_jessie
    :ip: 192.168.1.2
  vm_2:
    :box: debian_jessie
    :ip: 192.168.1.3
  vm_3:
    :box: centos_7
  

Then, the Vagrantfile

  require 'yaml'
  BOXES = YAML.load_file('etc/boxes.yaml')
  VMS = YAML.load_file('etc/vms.yaml')

  Vagrant.configure("2") do |config|
    VMS.each do |name, confs|
      config.vm.define name do |vm|
        config.vbguest.auto_update = false
        box = BOXES[confs[:box]]
        vm.vm.box = box[:name]
        vm.vm.network "private_network", ip: confs[:ip] if confs.has_key?(:ip)
        vm.vm.provider 'virtualbox' do |virtualbox|
          virtualbox.memory = box[:ram] if box.has_key?(:ram)
          virtualbox.cpus = box[:cpu] if box.has_key?(:cpu)
          box[:custom].each do |custom, value|
            virtualbox.customize ['modifyvm', :id, custom, value]
          end if box.has_key?(:custom)
        end
      end
    end
  end
  

There's no hard line telling which files you should have and which data should be in each one. Go for the best bet between flexibility and simplicity.


Now we have 3 files with less than 20 lines each, all of them quite readable, and that gets us 3 VMs with different OSs, RAM and CPU settings, IP and virtualizer specific stuff. And that we can very easily change, extend and/or reuse. Happy? Me too.

Where do we go from here?

What else can go in its own file and be consumed by the code? Maybe common hardware settings, network, some provisioning?

Look for repeating patterns. We took code and data apart at all when it became clear that the code was almost the same to all VMs. Then created the boxes file when VMs settings also seemed repetitive. That's the way to go.

It's not a good idea, though, to place every single repetiion pattern in distinct files. In some cases, you can end up with too many files with too little content and a file and variable maze to develop in.

The most important thing to keep in mind about dividing all data in their files is that this should always make maintenance easier. Other people, sometimes not Ruby developers, and even you months later, should be able to (re)use your Vagrantfile. If you changed the data architecture, and it seems more complicated, it probably is - Even more to those who didn't write it. Redo or undo.

Now you know why and how to separate data from code, hands on! Happy Vagrantfiles!