Parsing firewall policies using Python

Sajid Bisnar Khan
7 min readJun 15, 2020

In this article, I want to share with you how I converted ridiculously unstructured firewall rules in text format into an excel sheet using Python. I wrote this code to solve the actual real-world problem I faced while auditing security policies. In fact, this code played an integral role in pulling off the project successfully. Without wasting more time lets get the ball rolling

Understanding the Data structure

Let's start with a single firewall rule and extract key features from it

The 1st line is telling us how many hits (7856) were observed on this rule since it was first configured. 2nd line describes the Policy. The 3rd line captures the name of the rule (ME71) which uniquely identifies this policy. In the 4th and 5th lines, the key information is the security zone of the source (source-zone is Management) and destination (destination-zone is EDN) subnets, respectively. Similarly, the 6th and 7th lines indicate the name of source (EMS_SYS_FTP) and destination (NOC_SYS) address-books. Finally, allowed services and ports (FTP) are mentioned in the 8th line followed by what action to take (permit) in the 9th line. The source and destination address-book names represent IP subnets. This information is stored in a separate text file. For the sake of completeness let me copy-paste the address-book names that are being used in this rule

To make complete sense of the rule simply replace the address-book names (EMS_SYS_FTP and NOC_SYS) with the IPs. The end goal is to take all 1500 such rules and convert them into comprehensible excel format given below

You may be thinking, now that we know everything about data, its time to jump into the coding portion. Unfortunately, we are not there yet because even though the key features of each rule are identical the format can slightly vary. Let me explain to you what I mean. Consider below policy and compare it with previous policy

You may have noticed that description, destination-zone, destination-address, and service portion are missing in the rule but it is still a valid format because values that are not explicitly defined are equal to default values. None, any, any and any are default value for description, destination-zone, destination-address, and service, respectively.

Moreover, the rule can get even messier as it is also possible to have multiple lines of zones, address-books, and services. For example

In the above rule, source-zones are Management and Billing, allowed services are FTP, TCP 20, and TCP 21. So, the excel version will take the following form

It is also worth mentioning that it is possible to call the IPs directly without creating an address-book name. In that case, the above policy will become

Finally, it is also possible that an address-book name is a group containing address-book objects and IP subnets. For example,

Congratulations! if you managed to bear with me till this point because its time to get into the actual fun part!

Regular expressions in python

First, we need a way to match patterns to capture key features from each line for which regular expressions are ubiquitous. Simply import ‘re’ and you are good to go. Below is an example to capture ‘hits’. There are tons of content available in the internet to learn more about regular expression in case you do not already know

Similarly, we need to determine the regex pattern for each possible pattern expected in the rules. Below is a list of regex pattern for each feature

Python ‘re’ module’s method findall(FirstArg, SecondArg) expects regex pattern as the first argument and sentence to match from, as the second argument (re.findall(ptrn, sentence). All we need to do is go through each line of rule and check which pattern matches and store matched value in the dictionary. The key of the dictionary should be the feature name and value, the matched pattern. For example, applying this method on sentence “service FTP” using ptrn_service (regex pattern from an above snippet) should store {service:ftp} in the dictionary. Phew! we are done with the first piece of the puzzle. Now let's hop on to next step

Read text file and match pattern in each line

Now we need a method to read a text file in python and capture key and value pairs from each line. The best way to do that in python is by using with keyword to open a file. It is the recommended method because we don’t need to worry about releasing resources consumed when files were opened. Context manager takes care of setup and teardown of resources which avoids unnecessary resource leakage.

Have a look at boilerplate code which reads each line of a file and prints it on screen

We need to replace the print(line) method with actual code in the above code. In order to do that we need to take the following steps

  1. Create a python list and store regex patterns created earlier in it

2. Identify which pattern matches the current line and store key-value pair in a dictionary

3. If regex pattern matches the term action then append the dictionary in a list, merge address-book name and subnet only matches into a single key-value pair and then fill missing key features with default values and finally reset the dictionary

3. perform step 2 and 3 until the end of the rules text file

Code to store list of dictionaries. Each dictionary is a firewall rule
code fills missing key features with a default value

Wow! we are almost done. Up till this point, we have managed to capture all required data in a structured form apart from the last two columns. let's do that next to achieve the end goal

Capture actual IPs from address-book

Now we just need to capture source and destination address-book names and determine subnets configured against them. Update each dictionary entry in the list with corresponding subnets. To do that we need to perform the following steps

  1. Load address-book text file (I also replaced subnet mask 255.255.255.255 format to /32 to make look prettier) into one big dictionary in which key is the name of address-books and value are subnets against them
  2. Capture source and destination address-books from each rule dictionary and in try block extract IPs from address-book by parsing address-book name as dictionary key we got from step 1 and store in a list. And if an exception occurs, store dictionary key as a value in the list. Convert each entry of list(data type is a list of list) from list to string and update the dictionary values
extract subnet for address-book part1
extract subnet for address-book part2

Phew! we are done with the difficult part. The final output is a list of dictionaries in the format given below

There is built-in function in Pandas that enables us to write structured data to excel sheet

And there you go! the data is saved in excel_path in excel format. Needless to say, I did do minor manual work to make it look more presentable

I hope you learned something new. Cheers!

--

--