Managing NSX ALB with Terraform part 1: Overview

ALB has a great UI, but it also comes with a complete (if convoluted) API and Terraform provider. I’m a big fan of using Infrastructure-as-Code principles, and that’s exactly what I’ll dive into in this series.

Background and assumptions

We’re currently working through a datacenter renewal, one of the cornerstones of this renewal has been automating where we can, and introducing IaC principles along the way. A substantial component in this renewal includes the replacement of our F5 loadbalancers with NSX ALB. As Terraform was our platform of choice for managing NSX, it goes without saying that we use it for ALB as well.

This might become a pretty long series of posts, so in order to keep it somewhat readable, I’ll make a few assumptions:

  1. You have a working knowledge of the NSX Advanced Loadbalancer
  2. You have a working knowledge of Terraform
  3. You have some basic knowledge on how to use Swagger/OpenAPI

I might dive into these at some point, but there’s plenty of resources out there to get you going if you’re starting from scratch, I’ve listed a few below.

ALB

Terraform

Manual stuff

Our general mindset is that want to push everything we reasonably can into code.

Unfortunately, some configuration items are just too complex or unique to make that worth it. We did not bother with the items below:

  • Clouds: This is essentially a one-off for us, and it’s a fairly complex object (for NSX clouds) to put into code
  • vCenter objects: See above
  • SE Interface networks: We’ve got some legacy setups where we needed specific configuration. Again, this was a one-off so it was not worth it for us.

Structuring the repository

It’s generally a good idea to spend time working out your repository structure and general way of work with Terraform. This might seem trivial, but given that a Terraform folder generally means a Terraform state file, it becomes a pretty big pain in the butt to change it later. In our deployment, we have a the following tree:

├── Compute
├── Network
   ├── Gateways
   ├── Loadbalancer
      ├── General
         ├── certificates.tf
         ├── main.tf
         ├── provider.tf
         ├── segroups.tf
         ├── variables.tf
      ├── TenantA
         ├── PRD
            ├── main.tf
            ├── __FQDN1__.tf
            ├── __FQDN2__.tf
            ├── provider.tf
            ├── variables.tf
         ├── UAT
         ├── TST
         ├── DEV
      ├── TenantB
   ├── Segments
├──  Security

We’ve setup a single repository to define all our virtual infrastructure in our deployment. Loadbalancing is nested under the Network folder. We’ve split it up in a few folders:

  • General: This is all the global configuration that should be available to multiple tenants. It contains configuration for certificates, IPAM profiles and SE groups.
  • Tenant: We split up our environment in a several tenants, this allows a clear overview of each tenant’s configuration and reduces the potential impact in case thing go wrong.
  • PRD/UAT/TST/DEV: Within each tenant, we’ve created folders for every environment. This further reduces the blast radius of things going wrong.

You might have noticed quite a few files always return, this is part of our workflow. We’ll always define all variables in variables.tf, define all providers and the backend in provider.tf and set general configuration in main.tf. All other files are optional and created when we feel like the main.tf file would become too large or unwieldy. In this cause, we’ve split up the configuration of our SE groups and certificates in the general folder, and we created a TF file for every individual application.

The structure above works for us, but it has an important implication that should be considered: we need to update files in every folder on a semi-regular basis, usually when we need to upgrade either the module or the provider version. The big downside here is of course the amount of overhead, the upside is that you can test new versions on a subset of your configuration instead of pushing it to production immediately. You can work around this issue by using something like TerraGrunt. We did not bother as our environment is small enough to live with the overhead.

Modules

We use modules for stuff that is used on a (semi-)regular basis. This gives us quite a few benefits

  • pretty granular control of settings that we want to enforce
  • Keep our config files shorter and more readable
  • Enable other teams to deploy configuration without needing to understand all the intricate details of the platform

In our deployment, we created modules for the following items:

  • Pool
  • VSVIP
  • Virtual Service
  • All of the above

Conclusion

This post has been pretty light on technical content, but I hope it provides a clear overview on where we start from before I dive into the details. In the next post, I’ll go over the modules.