Perform diagnostics on your Terraform Enterprise deployment

This topic provides instructions on how to perform diagnostic tasks to identify and resolve errors with your Terraform Enterprise deployment.

Run a health check

Terraform Enterprise provides a /_health_check endpoint on the instance. If Terraform Enterprise is up, the health check will return a 200 OK.

The /_health_check endpoint operates in 2 modes:

Full check
Minimal check

With a full check, the service will attempt to verify the status of internal components and PostgreSQL, in contrast to a minimal check which returns 200 OK automatically after a successful check.

The endpoint's default behavior is to perform a full check during startup of the instance, and minimal checks after Terraform Enterprise is active and running.

To force a full check, include the additional query parameter ?full=1. This parameter causes every call to make requests to internal components and PostgreSQL, increasing system load and latency. Use it sparingly.

Generate a support bundle

A support bundle is a collection of logs and other information about your installation that you can then send to HashiCorp Support for further troubleshooting.

You can generate a support bundle using the tfectl support bundle command. Refer to Support bundle in the CLI reference for additional information.

Support bundle contents

Support bundles contain the following information:

Logs for all Terraform Enterprise services.
License information for the installation.
The Terraform Enterprise environment configuration with redacted secrets.
Additional diagnostic information about the container, such as the contents of /etc/hosts, disk and memory usage, and network configuration.

Logs are available within the bundle under the host directory. All other information is available within the results.json file located at the root of the bundle.

Check service status

To check the status of the services, execute the following command within Terraform Enterprise container.

$ supervisorctl status

Terraform Enterprise lists all services and their status in the console. Refer to the Terraform Enterprise services reference for information about each service.

$ supervisorctl status logs                             RUNNING   pid 39, uptime 1:38:49postgres                         RUNNING   pid 103, uptime 1:38:48redis                            RUNNING   pid 77, uptime 1:38:49tfe:archivist                    RUNNING   pid 199, uptime 1:38:46tfe:atlas                        RUNNING   pid 200, uptime 1:38:46tfe:atlas-ui                     RUNNING   pid 201, uptime 1:38:46tfe:backup-restore               RUNNING   pid 203, uptime 1:38:46tfe:licensing                    RUNNING   pid 205, uptime 1:38:46tfe:metrics                      RUNNING   pid 211, uptime 1:38:46tfe:nginx                        RUNNING   pid 215, uptime 1:38:46tfe:outbound-http-proxy          RUNNING   pid 220, uptime 1:38:46tfe:sidekiq                      RUNNING   pid 238, uptime 1:38:46tfe:slug-ingress                 RUNNING   pid 248, uptime 1:38:46tfe:task-worker                  RUNNING   pid 257, uptime 1:38:46tfe:terraform-registry-api       RUNNING   pid 265, uptime 1:38:46tfe:terraform-registry-worker    RUNNING   pid 280, uptime 1:38:46tfe:terraform-state-parser       RUNNING   pid 291, uptime 1:38:46tfe:tfe-health-check             RUNNING   pid 298, uptime 1:38:46tfe:vault                        RUNNING   pid 309, uptime 1:38:46tfe-next                         RUNNING   pid 40, uptime 1:38:49

Inspect logs

To inspect the logs for a particular service, execute the following command within the Terraform Enterprise container where SERVICE_NAME is the name of a Terraform Enterprise service.

$ cat /var/log/terraform-enterprise/SERVICE_NAME.log

For example, we can see why tfe:licensing exited.

$ cat /var/log/terraform-enterprise/licensing.log{"@level":"info","@message":"initializing database","@module":"tfe-licensing","@timestamp":"2023-05-10T20:46:26.379084Z"}{"@level":"error","@message":"error opening database connection","@module":"tfe-licensing","@timestamp":"2023-05-10T20:46:26.399064Z","error":"failed to connect to `host=/var/run/postgresql user=terraform-enterprise database=`: server error (FATAL: role \"terraform-enterprise\" does not exist (SQLSTATE 28000))"}

Run Kubernetes in debug mode

Terraform Enterprise dispatches plans and applies via jobs when running inside Kubernetes. These jobs are removed immediately after their execution, which can make it hard to understand if a job failed due to cluster-specific errors.

To make troubleshooting easier in these scenarios, it is possible to keep the kubernetes jobs alive for a limited period of time, after which they will get garbage collected by the cluster. To enable this, provide the following environment variables to the deployment, either via the env.variables entry in the values.yaml override, or via the ConfigMap attached to the deployment holding all of the environment variables.

TFE_RUN_PIPELINE_KUBERNETES_DEBUG_ENABLED. Boolean flag to enable debug mode, set to true.
TFE_RUN_PIPELINE_KUBERNETES_DEBUG_JOBS_TTL. (Optional) time in seconds after which the jobs will get deleted; default is 86400.