Predictive Hacks

How to Start Running Apache Airflow in Docker

airflow

The simplest and fastest way to start Airflow is to run it with CeleryExecutor in Docker. We assume that you have a basic understanding of Dockers and that you have already installed the Docker Community Edition (CE) on your computer.

Create an Airflow Folder

It is convenient to create an Airflow directory where you will have your folders like dags etc. So, open your terminal and run:

mkdir airflow-docker
cd airflow-docker

I created a folder called airflow-docker.

Download the docker-compose.yaml

To deploy Airflow on Docker Compose, you should fetch docker-compose.yaml. So let’s download it with the curl command

curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.0.1/docker-compose.yaml'

Some directories in the container are mounted, which means that their contents are synchronized between your computer and the container.

  • ./dags – you can put your DAG files here.
  • ./logs – contains logs from task execution and scheduler.
  • ./plugins – you can put your custom plugins here.

This file contains several service definitions:

  • airflow-scheduler – The scheduler monitors all tasks and DAGs, then triggers the task instances once their dependencies are complete.
  • airflow-webserver – The webserver available at http://localhost:8080.
  • airflow-worker – The worker that executes the tasks given by the scheduler.
  • airflow-init – The initialization service.
  • flower – The flower app for monitoring the environment. It is available at http://localhost:8080.
  • postgres – The database.
  • redis – The redis – broker that forwards messages from scheduler to worker.

Initialize the Environment

To initialize the environment, run the following command:

docker-compose up airflow-init

After initialization is complete, you should see a message similar to that below:

How to Start Running Apache Airflow in Docker 1

Run the Airflow

You can start running the Airflow by running the command:

docker-compose up

If you want to make sure that the container is running you should open a new terminal and run:

$ docker ps
How to Start Running Apache Airflow in Docker 2

Accessing the Web Interface

Once the cluster has started up, you can log in to the web interface and try to run some tasks. The webserver available at: http://localhost:8080. The default account has the login airflow and the password airflow

How to Start Running Apache Airflow in Docker 3

How to interact with the Airflow Command Line

After running the command docker ps we get the names and the ids of the running containers. If we want to interact with a particular container we can run the docker exec <container_name> <command>. Let’s get the airflow version:

docker exec airflow-docker_airflow-webserver_1 airflow version
How to Start Running Apache Airflow in Docker 4

Notice that in you airflow-docker folder you should find the following files and folders.

How to Start Running Apache Airflow in Docker 5

Cleaning Up

To stop and delete containers, delete volumes with database data and download images, run:

docker-compose down --volumes --rmi all

Are you ready to run your first DAG?

If you feel ready to run your first DAG you can have a look at our walk-through tutorial

References

[1] Apache Airflow

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

2 thoughts on “How to Start Running Apache Airflow in Docker”

  1. Wow, after all that i went through , it was your article which saved me , it is sooo frustating but i finally got through .. thaankyou soo much for writing this piece.

    Reply

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore