Perform diagnostics on your Terraform Enterprise deployment
This topic provides instructions on how to perform diagnostic tasks to identify and resolve errors with your Terraform Enterprise deployment.
Run a health check
Terraform Enterprise provides a /_health_check
endpoint on the instance. If
Terraform Enterprise is up, the health check will return a 200 OK
.
The /_health_check
endpoint operates in 2 modes:
- Full check
- Minimal check
With a full check, the service will attempt to verify the status of internal
components and PostgreSQL, in contrast to a minimal check which returns 200
OK
automatically after a successful check.
The endpoint's default behavior is to perform a full check during startup of the instance, and minimal checks after Terraform Enterprise is active and running.
To force a full check, include the additional query parameter ?full=1
. This
parameter causes every call to make requests to internal components and
PostgreSQL, increasing system load and latency. Use it sparingly.
Generate a support bundle
A support bundle is a collection of logs and other information about your installation that you can then send to HashiCorp Support for further troubleshooting.
You can generate a support bundle using the tfectl support bundle
command.
Refer to Support bundle
in the CLI reference for additional information.
Support bundle contents
Support bundles contain the following information:
- Logs for all Terraform Enterprise services.
- License information for the installation.
- The Terraform Enterprise environment configuration with redacted secrets.
- Additional diagnostic information about the container, such as the contents
of
/etc/hosts
, disk and memory usage, and network configuration.
Logs are available within the bundle under the host
directory. All other
information is available within the results.json
file located at the root of
the bundle.
Check service status
To check the status of the services, execute the following command within Terraform Enterprise container.
$ supervisorctl status
Terraform Enterprise lists all services and their status in the console. Refer to the Terraform Enterprise services reference for information about each service.
$ supervisorctl status logs RUNNING pid 39, uptime 1:38:49postgres RUNNING pid 103, uptime 1:38:48redis RUNNING pid 77, uptime 1:38:49tfe:archivist RUNNING pid 199, uptime 1:38:46tfe:atlas RUNNING pid 200, uptime 1:38:46tfe:atlas-ui RUNNING pid 201, uptime 1:38:46tfe:backup-restore RUNNING pid 203, uptime 1:38:46tfe:licensing RUNNING pid 205, uptime 1:38:46tfe:metrics RUNNING pid 211, uptime 1:38:46tfe:nginx RUNNING pid 215, uptime 1:38:46tfe:outbound-http-proxy RUNNING pid 220, uptime 1:38:46tfe:sidekiq RUNNING pid 238, uptime 1:38:46tfe:slug-ingress RUNNING pid 248, uptime 1:38:46tfe:task-worker RUNNING pid 257, uptime 1:38:46tfe:terraform-registry-api RUNNING pid 265, uptime 1:38:46tfe:terraform-registry-worker RUNNING pid 280, uptime 1:38:46tfe:terraform-state-parser RUNNING pid 291, uptime 1:38:46tfe:tfe-health-check RUNNING pid 298, uptime 1:38:46tfe:vault RUNNING pid 309, uptime 1:38:46tfe-next RUNNING pid 40, uptime 1:38:49
Inspect logs
To inspect the logs for a particular service, execute the following command
within the Terraform Enterprise container where SERVICE_NAME
is the name of a
Terraform Enterprise service.
$ cat /var/log/terraform-enterprise/SERVICE_NAME.log
For example, we can see why tfe:licensing
exited.
$ cat /var/log/terraform-enterprise/licensing.log{"@level":"info","@message":"initializing database","@module":"tfe-licensing","@timestamp":"2023-05-10T20:46:26.379084Z"}{"@level":"error","@message":"error opening database connection","@module":"tfe-licensing","@timestamp":"2023-05-10T20:46:26.399064Z","error":"failed to connect to `host=/var/run/postgresql user=terraform-enterprise database=`: server error (FATAL: role \"terraform-enterprise\" does not exist (SQLSTATE 28000))"}
Run Kubernetes in debug mode
Terraform Enterprise dispatches plans and applies via jobs when running inside Kubernetes. These jobs are removed immediately after their execution, which can make it hard to understand if a job failed due to cluster-specific errors.
To make troubleshooting easier in these scenarios, it is possible to keep the kubernetes jobs alive for a limited period of time,
after which they will get garbage collected by the cluster. To enable this, provide the following environment variables to
the deployment, either via the env.variables
entry in the values.yaml
override, or via the ConfigMap
attached to the
deployment holding all of the environment variables.
TFE_RUN_PIPELINE_KUBERNETES_DEBUG_ENABLED
. Boolean flag to enable debug mode, set totrue
.TFE_RUN_PIPELINE_KUBERNETES_DEBUG_JOBS_TTL
. (Optional) time in seconds after which the jobs will get deleted; default is86400
.