Firewall as Code
26 Sep 2024Table of content
Looking at different ways to turn Palo Alto configuration to code. My idea was that every setting could be set in code and change history could be stored in Git. I planned on using YAML files to define how the firewall should be configured.
Test Environment
I have a spare PA-220 at home. It’s a good little firewall, but commit times are horrendous. Instead of running the tests on my physical firewall, I settled on using containerlab. I downloaded a VM image from Palo Alto support and used vrnetlab to create a containerlab-compatible image.
I performed a quick test on both my physical PA-220 and the containerlab-run Palo Alto firewall. I created a new address object and then committed that change. For the containerlab firewall it took 35 seconds and for the PA-220 it took 200 seconds. That’s a huge difference! Also, tearing down and bringing up containerlab is much easier than doing the same for physical devices.
Terraform
I’ve previously used Terraform to deploy infrastructure to AWS. I discovered that Palo Alto has an official Terraform provider. It was natural to start here.
I started with address objects. I created a YAML file with the objects I wanted and then looped through it using panos_address_object resource. It worked pretty well with just a few lines. However, I was surprised to find that the provider was not able to commit the changes directly. For actual commits, we would need to compile a Go application to handle the commit. The documentation provided an example of how to do the changes and commit them.
This wasn’t a significant issue for me, so I continued and created Terraform configurations for address groups, tags, interfaces, zones, and security rules. I made separate .tf files for each resource type. Similarly to address objects, I looped through the values defined in YAML files.
Issues with Terraform
As I mentioned before, the goal was to turn every configuration to code. This is where I ran into a wall with Terraform. The provider lacked many resources I needed. It simply didn’t support all the features required.
Nornir
After encountering limitations with Terraform, I started looking at Nornir. Instead of using Terraform’s HCL to define the infrastructure, in Nornir, we use Python.
Nornir itself is just a framework; it can’t connect to devices by itself. Instead, we use plugins. I chose to use the NAPALM plugin to interact with Palo Alto. This plugin provides ready-to-use Nornir tasks that call NAPALM methods. A task is a piece of code that implements functionality for a single host.
NAPALM (Network Automation and Programmability Abstraction Layer with Multivendor support) abstracts the way you interact with network devices. There are quite a few community drivers for many vendors, including Palo Alto. So instead of programming vendor-specific ways to do things, we can use methods like load_replace_candidate() to load and replace a candidate config from a file. The requirement for using a method is that the vendor driver has to support it.
Nornir acts as a coordinator. We can use an inventory file or load information about hosts from external systems like Netbox. We can then filter those devices and run the defined tasks on them. It is very similar to Ansible.
Changing the Strategy
My strategy changed after using Terraform. In Terraform, I would create each resource one by one. I decided that instead of doing that, I would get the current running configuration from the firewall, make the changes to that configuration locally, and then send it back. This approach offers several advantages:
- Flexibility: We can define any configuration we want without being limited by the provider’s resource support.
- Consistency: We can ensure that no configuration drift gets introduced, even if someone makes changes directly to the firewall.
- Gradual Automation: We can still automate settings gradually, adding more configurations over time.
I created a Jinja2 template for each resource type. Then, I used the same YAML files I had created for Terraform to generate patches using the templates. Since Palo Alto’s configuration is in XML, I utilized XPath to find and remove the resources I wanted to manage with Nornir from the configuration. After removal, the patches were applied to the configuration using XPath. Once the configuration was modified, it was imported into Palo Alto, loaded, and then committed. Nornir printed the diff from Palo Alto. I also added a flag for a dry-run option. Instead of committing, it would revert the changes. In production, it’s a good idea to run a dry-run first to see what changes Palo Alto would make before actually applying them.
Issues with Nornir
I ran into some problems with Netmiko. The NAPALM Palo Alto driver uses the API for most operations, but some actions like config diff and commit are performed via SSH. For those SSH connections, Netmiko is used.
I noticed that the napalm_configure task wasn’t showing any diff even when there was one. It required some adjustments to get it working correctly. I had to modify how Netmiko handled the expected strings during SSH interactions.
I ran Nornir in a container, so I made these fixes while creating the container.
sed -i -e 's/send_command("show config diff")/send_command("show config diff", expect_string=r">")/g' /usr/local/lib/python3.9/site-packages/napalm_panos/panos.py
sed -i -e 's/expect_string="100%"/expect_string="#"/g' /usr/local/lib/python3.9/site-packages/netmiko/paloalto/paloalto_panos.py
Ansible
After Terraform, I went directly to Nornir since I felt it would give me complete control and I wouldn’t run into limitations. Initially, I skipped Ansible because I thought I would encounter the same issues as with Terraform. However, after creating the Nornir playbook, I started looking at Ansible. I found out that the same thing I did with Nornir could be accomplished with Ansible.
Using Ansible instead of Nornir in production might be a better fit since more people are familiar with it. Creating scripts is one thing, but you also need to maintain them. Ensuring your team has the resources to maintain the scripts is key.
Issues with Ansible
I encountered some challenges. For instance, I could not get the config diff from Palo Alto. I ended up creating a custom module to show the diff. The module did the same thing the NAPALM driver did: execute show config diff via SSH with Netmiko.
Another issue I run into was that Ansible has a --check flag that can be used to dry-run a playbook. However, using that prevents certain modules like panos_loadcfg from loading the configuration. NAPALM implemented this by first taking backup, loading the new config, checking the diff and then reverting to backup.
Conclusion
Every tool had it’s own problems. I really dislike that we can’t rely solely on the API and still need to use SSH for certain tasks. This seems to be the source of many issues I run into.
On this post I talked about turning firewall configuration to code. Next step is to think about using this to actually automate and decrease the work needed for firewall configuration. Just turning configuration to code could be useful for allowing users to propose changes. For example, network engineers could be given read-only access to firewalls. They could log in, see what they would like to change, modify the YAML files, and then open a pull request. More senior engineers could review the proposed changes and approve them. After approval, a CI/CD pipeline could be used to implement the change. With containerlab, you could even run all the proposed changes in a virtual lab before senior engineers review the change. This could be used to run automated tests and automatically disapprove a change if it did not meet the standards.
However, this might not be optimal in large environments. Also, the people opening pull requests would need to be familiar with Palo Alto and networking. In the real world, changes, especially firewall rules, are often proposed by non-networking teams. Perhaps developers need a port open for their new application.
For automations to be useful and scalable, we would need more abstractions. Let’s say you wanted to create a new IPSec tunnel on a Palo Alto firewall. This would require you to create multiple resources. Just having configuration as code would not decrease the work. Instead, it might be useful to define a model for tunnels. The model could ask for the required information and then fill in the rest automatically.
Tools like Netbox or Nautobot could be used for modeling. This would provide a graphical interface instead of modifying YAML files by hand and decrease the work needed for the change.