Terraform States
Before we move to the 2nd part of Terraform CLI, I think it is important to discuss the Terraform States. This is one of the fundamental concepts while learning to work with Terraform. We will make use of this commit for the example we have been using in this Terraform series. Feel free to clone the repo and follow along.
What are Terraform states?
As we know now, Terraform helps us manage cloud resources in the form of IaC and thus it benefits from several best practices used in software development. One of the benefits is version control - usually managed by Git which is also capable of remotely maintaining the repositories.
However, version control is not enough when it comes to infrastructure code development and management. In order to manage the infrastructure, Terraform needs to know how many real-world infrastructure assets exist. It needs to maintain a mapping of these real-world objects so that future operations can be carried out successfully.
Terraform maintains a database of mappings in the form of state files, either locally or remotely. This post focuses mostly on states - remote management of states will be covered in detail in a different post.
Why does Terraform need states?
The lifecycle of cloud resources managed by Terraform revolves around the creation, modification, and destruction of these resources. When we run terraform apply in a freshly configured/cloned IaC - real-world resources are created (if everything goes well). Now, if we want to modify some resources in the same configuration, we want to be sure appropriate resources are being modified/destroyed based on the changes in the configuration. Thus, the only way for Terraform to be correct in this regard is by referring to the configuration-to-real world bindings - which are stored in Terraform state.
States in action
Let us see the state files in action. Clone the example repository on your local disk and note the files. We can see we have 3 configuration files (main.tf, variables.tf, and provider.tf), 1 README.md file, 1 .gitignore file and 1 .terraform.lock.hcl file. Currently, we don't see any state file. Since this is a freshly cloned repository, initialize Terraform into its root directory.
cd tf-blog
terraform init
Note the changes to the root directory again. Initialization has added a .terraform
directory and it contains a file that is not human-readable. However, by the name of the file you can understand it is the plugin for aws
, which was downloaded by Terraform during initialization. Still, nothing about states.
Note: It is assumed that the AWS IAM credentials are already configured using AWS CLI on the local system. Terraform uses the same credentials to work with AWS APIs.
Let us plan our resources. Run terraform plan and keep an eye on the files in the directory. While Terraform is planning the resources, a couple of files appear, and then they disappear when the terraform plan
output is printed to the console.
As we can see in the screenshot above, those files are - terraform.tfstate
and .terraform.tfstate.lock.info
, which is no more available.
Terraform plan output:
. . .
. . .
Plan: 3 to add, 0 to change, 0 to destroy.
Changes to Outputs:
+ instance_id = [
+ (known after apply),
+ (known after apply),
+ (known after apply),
]
------------------------------------------------------------------------
Note: You didn't specify an "-out" parameter to save this plan, so Terraform
can't guarantee that exactly these actions will be performed if
"terraform apply" is subsequently run.
What just happened?
Terraform analyses all the configurations included in *.tf
files and evaluates the same against the state file, which does not exist. Since there is no state associated, it generates a plan to create everything that is included in the configuration.
terraform.tfstate
the file is supposed to contain the state information, but since nothing has been applied, i.e. no real-world infrastructure is created, the state file is not saved during this planning. .terraform.tfstate.lock.info
file - this has to do with the locking mechanism, which we will go through in a while.
Optionally, you can save the output of the terraform plan
command by specifying -out parameter and using the same plan output while apply
-ing.
Let us apply the configuration. Run terraform apply
in the root directory.
Note that the same 2 files which we observed in the previous step are created and this time they did not disappear. This is because terraform apply
has actually created resources and now the bindings are saved in terraform.tfstate
.
It is very tempting to take a look at terraform.tfstate
the file as of now. But I would suggest to park the temptation for now and instead of taking a look at the state file, destroy the resources. Run terraform destroy
.
Terraform state file format
{
"version": 4,
"terraform_version": "0.14.3",
"serial": 8,
"lineage": "bfc6125c-e1b2-ab53-8935-380a317f441c",
"outputs": {},
"resources": []
}
Take a look at terraform.tfstate
file and it should look something like the above. This is a general Terraform state format which is a JSON object and has certain properties. Mainly it represents the version of state format, Terraform’s version, serial, lineage, outputs, and resources.
Take a look at terraform.tfstate
file and it should look something like the above. This is a general Terraform state format which is a JSON object and has certain properties. Mainly it represents the version of state format, Terraform’s version, serial, lineage, outputs, and resources.
Lineage is a code property provided by Terraform, it is a way to protect the configuration with the help of timestamps.
The most important sections for us right now are the outputs and resources. The reason why I call these sections and not just attributes is that they represent the major chunk of Terraform state and contain all the properties of the resources created.
Output, as the property name suggests, would contain all the output values requested in the configuration files (.tf
). Resources are an array of resources created along with their properties.
Note: Terraform also creates the backup of the state file after destroy. As the name suggests, it should have the backup of the last version of the state file (when the EC2 instances were created). In our case, it should have the contents of the original state file which we did not take a look at before destruction.
Looking at the empty TF state file helps understand the contents more easily as a lot of information is skipped. Let us apply the changes and then observe the terraform.tfstate
file and the backup file. Run terraform apply
.
terraform.tf
state file now contains the details of all the outputs we specified in the configuration files, and the resources section contains an array of instances we created using Terraform configuration. The backup file now stores the previous empty version of the state file. Take your time to go through the contents of both files.
Significance of Terraform states
This way state files help Terraform to understand the changes in the configurations and determine which changes it needs to perform in the next
terraform apply
the operation.States also help Terraform to understand dependencies.
In the case of complex infrastructure deployment, running
terraform plan
can turn out to be very slow and may also breach API usage limits if Terraform had to check for currently deployed resources directly from the provider. Maintaining states help address this performance issue.
State lock
State locks are not of much significance while working with Terraform locally since usually, only one person is responsible to write the configuration and execute it. However, things can get out of hand if multiple users are trying to run their changes simultaneously.
For this reason, Terraform makes use of a locking feature that disables modification of state files when terraform plan or apply is in progress. It is for this purpose we can observe the temporary existence of .terraform.tfstate.lock.info
file.
It becomes very important when a team collaborates on the same configuration using remote backends.
Note: Although Terraform states are stored in JSON format, which is very human-readable - it is still advised that manual changes should be avoided.
Importing existing resources
By default, Terraform manages cloud resources that are created by itself. If Terraform is newly introduced in the pre-existing cloud environment, sometimes it makes sense to include existing resources under Terraform management.
In such cases, it is possible to import existing resources in Terraform state by using terraform import command. Please note, state files should not be modified manually to include existing resources as there is a lot of scope of human errors and if the state gets corrupted, the path to recovering from resulting technical debt is long.
At this point, terraform is not very mature when it comes to importing resources. Currently, it can import the resource and update the state file, but cannot write the corresponding configuration by itself. However, it is expected that this feature would be available in future versions.
To successfully import the resource, we first need to write the target configuration into our existing configuration files and then run terraform import command. Let us see the same in action.
Currently, our example contains 3 AWS EC2 instances. Go ahead and create a 4th EC2 instance manually by logging into AWS management console. We will pretend that this VM exists from the past and Terraform doesn't know about it and thus it will not be managed by Terraform.
Since we need to first write the configuration of this VM in Terraform file, I have added the below code to the main.tf
file.
resource "aws_instance" "demo_new" {
provider = aws.aws_west
ami = "ami-03130878b60947df3"
instance_type = "t2.micro"
}
We are ready with the target configuration and the VM is running successfully. Now to get this VM under Terraform management, run the below command.
Terraform import aws_instance.demo_new <instance ID>
Terraform import commands take 2 parameters - target configuration identifier (aws_instance.demo_new
) and instance ID (to be copied from AWS Management console). Terraform uses this information to map the configuration to the real-world virtual machine.
If successful, it should give the below output:
aws_instance.demo_new: Importing from ID "i-05ad4fa08384df633"... aws_instance.demo_new: Import prepared! Prepared aws_instance for import aws_instance.demo_new: Refreshing state... [id=i-05ad4fa08384df633] Import successful!
The resources that were imported are shown above. These resources are now in
your Terraform state and will henceforth be managed by Terraform.
If you got this message, you have successfully imported this AWS EC2 instance within Terraform management. You can now destroy this VM by executing terraform destroy in the same terminal.
Sensitive Data
If we observe the state files, there is a lot of data about our EC2 instances printed in JSON format. This also includes sensitive data like private keys, IP addresses, etc. Similarly, various types of resources have sensitive data associated with them.
Sharing state files becomes a security threat in this case. It is always advised not to include state files into the version control repositories and it may expose sensitive data to whoever has access to this repository. If you look at the .gitignore
file, we have already included the state file so that the state files are not synced with remote repositories publicly.
Another approach to handle sensitive data is to use remote backends. Remote backends provide a secure way to handle sensitive data. They offer encryption at rest and the transit of the same happens over TLS. We will cover backends in a later post.
Workspaces
Workspaces are to state what branches are to code. As mentioned earlier in this post, version control systems to manage the code are not enough for working with Terraform IaC. We have learned quite some stuff about a state in previous sections. One question we may have is - if we do some code (config) changes, is there a way to test the same without disturbing the existing infrastructure? The answer is Workspace.
By default, Terraform works with the default workspace which is analogous to the main (master) branch of Git. Any changes to the configuration will reflect in the current set of real-world infrastructure. Meaning, even if the code change happens on a different branch of VCS, the same state would be affected. This can cause a lot of confusion and errors.
It makes sense to have a different “workspace
”, to reflect changes made on a different branch of IaC. Using workspaces, Terraform allows us to have a completely “new environment” with a new set of resources. Terraform uses the same configuration included in .tf file but uses a different workspace. This is great for testing the modifications made to configurations.
Let us use the same example to do some hands-on with workspaces.
In the root directory, identify the current workspace you are in by running the below command.
terraform workspace show
Output:
default
Let us create a new workspace:
terraform workspace new myWs
To switch from one workspace to another use the below command:
terraform workspace select <name of the workspace>
Currently, we have 2 workspaces (default and myWs)
Inspect the state in both workspaces by running the below command:
terraform show
Assuming both the workspaces are empty, let us select one of the workspaces to run terraform apply. We go with the default workspace and run terraform apply
.
Once terraform apply
is successful, run terraform show
again to check the current state in the default workspace. You should see 4 resources being created.
Checkout Select the other workspace - myWs and show the state of the same.
terraform workspace select myWs
terraform show
It should show no resources. As we can see, the default workspace has 4 resources created from the same configuration, but myWs workspace has nothing. If you run terraform apply now, it would create 4 new EC2 instances associated with myWs workspace.
Verify in AWS management console that in all there are 8 EC2 instances created.
If we take a look at our directory structure, Terraform has created a new directory terraform.tfstate.d
, and within the same, it has created myWs directory to store states associated with this workspace. Similarly, if you create more workspaces, states of all those workspaces will be stored in appropriate sub-directories.
Thus workspaces provide a great way to test configuration changes, especially in complex projects. Before we wrap this up, do not forget to destroy resources in all the workspaces by running terraform destroy.
Note: Workspaces are a great way to isolate infrastructure tests, but it is not a good idea to treat workspaces as “different environments”. To understand this we need to have an understanding of modules and remote backends, which we would cover in upcoming posts.