Terraform Data Sources
Data Sources in Terraform is one of the important concepts which enables you to work with data values sourced from somewhere else. Somewhere else here would mean other modules, cloud providers, or even locally.
There are situations where the Terraform code needs to query a “fresh” set of values to be used while applying the configuration. These queries are based on certain criteria, which means any value that fulfills the criteria is good enough for the infrastructure configuration to be created.
Data sources are similar to resources but not the same. They are also made available by the providers, the way resources are made available. Terraform Registry documents the same. Data sources are essentially a special kind of resource. The difference is that they do not participate in creating, updating, or deleting the real-world infrastructure objects but they help build them by providing a set of values to the resource configuration.
Below is an example of how an AWS data block is written in the configuration. Terraform provides us with a special data configuration block. The first parameter that we use here is the data source identifier for a given cloud provider. Similar to resources, this identifier starts with the provider part and resource part, separated by an underscore. The second parameter is the local name of the data source. The local name is valid for the scope of a particular configuration/module where this data source is being used.
data "aws_ami" "myAmi" {
owners = ["099720109477"]
most_recent = true
filter {
name = "description"
values = ["Canonical, Ubuntu, 20.04 LTS*"]
}
filter {
name = "architecture"
values = ["x86_64"]
}
}
The local_name should be chosen in such a way that the combination <provider_source>.<localname>
is unique across the current configuration. This forms the identifier of the data source which is also used to refer to other resources of the configuration.
Every data source may have its set of arguments - some of them are required. Just like resources, there are many data sources made available by cloud providers. It becomes too difficult to remember everything and thus it is highly recommended to look for the documentation on Terraform Registry. For instance, this is the documentation for azurerm_managed_disk
data source.
When a Terraform configuration, which uses data sources, is planned by running terraform plan
, a query is made via an API call to appropriate cloud providers to fetch the values. If there are any local-only data sources, then those are prepared too. All this happens before Terraform starts to plan the deployment.
Data sources may also depend on certain values that are generated after applying the configuration - thus forming implicit dependency. Besides, data sources also support the meta-argument depends_on
, forming explicit dependency. In such cases, data sources read the values when the apply is complete.
Terraform comes with its own set of data sources and resources. To utilize them, terraform itself acts as a provider. Unlike other providers, the Terraform provider has bundled along with its installation. Some of the data sources supported by Terraform can be found here.
provider, count, for_each, and depends_on
are the meta-arguments supported in data sources. lifecycle
is not yet supported.
Let us try to make use of data sources in our ongoing example. We will build on this commit.
Currently, in our example, we are creating 2 EC2 instances. These EC2 instances make use of a particular image (AMI) in their configuration so that when these virtual machines are created, they will be created based on the AMI ID provided in the variable ami
.
In a way, this is hardcoded and I would like to make it a bit more dynamic by using a data source. Let us introduce a data source block as below in the main.tf
file.
data "aws_ami" "myAmi" {
owners = ["099720109477"]
most_recent = true
filter {
name = "description"
values = ["Canonical, Ubuntu, 20.04 LTS*"]
}
filter {
name = "architecture"
values = ["x86_64"]
}
filter {
name = "image-type"
values = ["machine"]
}
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20201026"]
}
}
Here we are using aws_ami
data sources provided by aws
provider. I have locally named it as myAmi
. The first argument indicates the owner of the AMI. Into the AWS management console, navigate to EC2 service and there you can find a list of AMIs to choose from. Several parameters are available to set your criteria.
In my case, I set the below criteria using the filters:
All images are owned by “
099720109477
”. This value could be amazon or marketplace to make it simple. But specifying the account number like this helps you narrow down more.Description filter informs Terraform to include all images for which the description contains “
Canonical, Ubuntu, 20.04 LTS
” string.Architecture filter specifies the type of processor architecture-based images - in this case,
x86_64
.The image-type filter specifies the time of images to be “
machine
”.The name filter specifies the name of the image set by the owner at the time when the image was being built.
By specifying this data source, we ask Terraform to make all those AMI IDs available for the configuration, which satisfies the given criteria. This happens just before a terraform plan
command is executed. Before we execute to plan, let us also make changes to our ami
argument in our resource blocks, to make use of this data source as below.
ami = data.aws_ami.myAmi.id
aws_ami
data source returns IDs of Amazon Machine Images satisfied by given criteria and thus can be accessed using the .id
property. This is as specified in Terraform registry.
Let us run the terraform plan
command and observe the output.
. . .
Terraform will perform the following actions:
# aws_instance.demo_vm_1 will be created
+ resource "aws_instance" "demo_vm_1" {
+ ami = "ami-00831fc7c1e3ddc60"
+ arn = (known after apply)
+ associate_public_ip_address = (known after apply)
. . .
. . .
# aws_instance.demo_vm_2 will be created
+ resource "aws_instance" "demo_vm_2" {
+ ami = "ami-00831fc7c1e3ddc60"
+ arn = (known after apply)
+ associate_public_ip_address = (known after apply)
. . .
The plan output says it is going to use ami-00831fc7c1e3ddc60
for both VMs. Thus we have successfully used a data source to specify our AMI values dynamically instead of hardcoding the same.
As you go deeper, there are quite a lot of use cases where data sources come in handy.
Hope this article helped you get started with Terraform Data Sources. In the next post, we will talk about remote backends
.