Completing the LGTM Stack with Grafana Tempo

2026-04-16

Why Tracing

After getting Grafana, Mimir, Loki, and Alloy running on my servers, I had metrics and logs covered — but I was still missing the third pillar of observability: traces. When a slow request came in, I could see the spike in Mimir and the log lines in Loki, but I couldn't easily tell which part of the request was slow.

Adding Grafana Tempo was the natural next step. It completes the full LGTM stack (Loki, Grafana, Tempo, Mimir) and — combined with OpenTelemetry — gives me end-to-end traces correlated directly with logs.


Installing Tempo on the Monitoring Server

I followed the same pattern as the rest of the stack: a Docker container on the monitoring server with a named volume.

1. Create the directories and set permissions:

sudo mkdir -p /srv/tempo/{config,data}
sudo chown -R 10001:10001 /srv/tempo
docker volume create tempo-data

We use 10001 as the UID for the same reason as with Mimir and Loki — it's the default user inside Grafana Labs containers.

2. Create the Tempo config at /srv/tempo/config/tempo.yaml:

stream_over_http_enabled: true

server:
  http_listen_port: 3200
  grpc_listen_port: 9095
  log_level: warn

distributor:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4317
        http:
          endpoint: 0.0.0.0:4318

ingester:
  max_block_duration: 5m

storage:
  trace:
    backend: local
    local:
      path: /var/tempo/blocks
    wal:
      path: /var/tempo/wal

metrics_generator:
  registry:
    external_labels:
      source: tempo
  storage:
    path: /var/tempo/generator/wal
    remote_write:
      - url: https://mimir.example.com/api/v1/push
        send_exemplars: true
        basic_auth:
          username: username
          password: password

overrides:
  defaults:
    metrics_generator:
      processors: [service-graphs, span-metrics]
    ingestion:
      max_traces_per_user: 0
    compaction:
      block_retention: 168h   # 7 days

The metrics_generator section is what makes Tempo really powerful — it automatically generates RED metrics (request rate, error rate, duration) from your traces and pushes them into Mimir, so you get Prometheus-style metrics without any extra instrumentation.

3. Start the container:

docker run -d \
  --name tempo \
  --restart unless-stopped \
  -p 3200:3200 \
  -p 4317:4317 \
  -p 4318:4318 \
  -v /srv/tempo/config/tempo.yaml:/etc/tempo/tempo.yaml:ro \
  -v tempo-data:/var/tempo \
  grafana/tempo:2.10.4 \
  -config.file=/etc/tempo/tempo.yaml

I'm pinning to a specific version here because at the time of writing, Grafana was actively changing the config structure and the latest image had breaking changes. Worth checking the release notes before upgrading.

Verify it's running with: docker ps | grep tempo


Exposing Tempo via Nginx

Tempo needs two different entry points: an HTTPS endpoint for Grafana to query traces, and a gRPC port for Alloy to ship traces from the app server.

1. Set up basic authentication:

sudo htpasswd -c /etc/nginx/.htpasswd-tempo username
sudo chmod 640 /etc/nginx/.htpasswd-tempo
sudo chown root:www-data /etc/nginx/.htpasswd-tempo

2. Generate the SSL certificate:

sudo certbot certonly --webroot -w /var/www/letsencrypt -d tempo.example.com

3. Create the Nginx config at /etc/nginx/sites-available/tempo.conf:

# HTTP → HTTPS redirect
server {
  listen 80;
  server_name tempo.example.com;
  return 301 https://$host$request_uri;
}

# HTTPS: Grafana query traffic (HTTP/1.1 REST)
server {
  listen 443 ssl http2;
  server_name tempo.example.com;

  ssl_certificate     /etc/letsencrypt/live/tempo.example.com/fullchain.pem;
  ssl_certificate_key /etc/letsencrypt/live/tempo.example.com/privkey.pem;
  ssl_session_timeout 1d;
  ssl_session_cache shared:SSL:10m;
  ssl_protocols TLSv1.2 TLSv1.3;
  ssl_prefer_server_ciphers on;

  client_max_body_size 20m;

  auth_basic "Tempo";
  auth_basic_user_file /etc/nginx/.htpasswd-tempo;

  location / {
    proxy_pass http://127.0.0.1:3200;
    proxy_http_version 1.1;

    proxy_set_header Host              $host;
    proxy_set_header X-Real-IP         $remote_addr;
    proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;

    proxy_read_timeout 300;
    proxy_connect_timeout 300;
    proxy_send_timeout 300;
  }
}

# gRPC port 4327: Alloy trace ingestion
server {
  listen 4327 ssl http2;
  server_name tempo.example.com;

  ssl_certificate     /etc/letsencrypt/live/tempo.example.com/fullchain.pem;
  ssl_certificate_key /etc/letsencrypt/live/tempo.example.com/privkey.pem;
  ssl_session_timeout 1d;
  ssl_session_cache shared:SSL:10m;
  ssl_protocols TLSv1.2 TLSv1.3;

  # IP allowlist — only your app servers should push traces
  allow YOUR_IP_ADDRESS;
  deny all;

  location / {
    grpc_pass grpc://127.0.0.1:4317;
    grpc_set_header Host $host;
    grpc_read_timeout 300s;
    grpc_send_timeout 300s;
  }
}

Port 4327 is just an arbitrary port I chose for the gRPC ingestion endpoint. Since I only have a small number of servers that push traces, I protect it with an IP allowlist rather than basic auth — gRPC and basic auth have some rough edges that are easier to avoid.

4. Enable the config:

sudo ln -s /etc/nginx/sites-available/tempo.conf /etc/nginx/sites-enabled/tempo.conf
sudo nginx -t && sudo systemctl reload nginx

Configuring Alloy on the Web Server

Alloy already runs on my app server from the previous setup. I just needed to add the tracing pipeline to its config.

1. Open /etc/alloy/config.alloy and add the following to the bottom:

// Tracing: receive OTLP from Laravel and forward to Tempo

otelcol.receiver.otlp "default" {
  grpc {
    endpoint = "127.0.0.1:4317"
  }
  http {
    endpoint = "127.0.0.1:4318"
  }

  output {
    traces  = [otelcol.exporter.otlp.tempo.input]
    metrics = [otelcol.exporter.prometheus.mimir.input]
    logs    = [otelcol.exporter.loki.default.input]
  }
}

otelcol.exporter.otlp "tempo" {
  client {
    endpoint = "tempo.example.com:4327"
    tls {
      insecure = false
    }
  }
}

otelcol.exporter.prometheus "mimir" {
  forward_to = [prometheus.remote_write.mimir.receiver]
}

otelcol.exporter.loki "default" {
  forward_to = [loki.write.default.receiver]
}

This sets up Alloy to receive OTLP data from the Laravel app on 127.0.0.1:4317/4318 and fan it out — traces go to Tempo, metrics to Mimir, and logs to Loki.

2. Restart Alloy:

sudo systemctl restart alloy
sudo systemctl status alloy

Laravel and PHP OpenTelemetry Setup

PHP doesn't have OpenTelemetry built in, so we need to install the extension and wire up a Laravel package.

1. Install the PHP OpenTelemetry extension:

sudo apt-get install php8.4-opentelemetry

This installs and enables the extension automatically. Adjust the version number to match your PHP installation.

2. Add a local .env override before running composer:

OTEL_SDK_DISABLED=true

Without this, the next composer require will throw an error on your local machine because there's no Tempo instance running locally.

3. Install the Laravel OpenTelemetry package:

composer require keepsuit/laravel-opentelemetry

4. Publish the config:

php artisan vendor:publish --provider="Keepsuit\LaravelOpenTelemetry\LaravelOpenTelemetryServiceProvider" --tag="opentelemetry-config"

This creates config/opentelemetry.php where you can fine-tune what gets instrumented.

5. Add the OTLP log channel to config/logging.php:

'channels' => [
    'stack' => [
        'driver' => 'stack',
        'channels' => ['daily', 'otlp'],
    ],

    'otlp' => [
        'driver' => 'monolog',
        'handler' => \Keepsuit\LaravelOpenTelemetry\Support\OpenTelemetryMonologHandler::class,
        'level' => 'debug',
    ],
],

This is what ties traces to logs — every log line gets the current trace ID attached, so you can jump from a log entry directly to the full trace in Grafana.

6. Add the production .env variables:

OTEL_SDK_DISABLED=false

OTEL_SERVICE_NAME=your-app-name

# Point to Alloy running locally on the app server
OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:4318
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf

OTEL_PROPAGATORS=tracecontext,baggage

OTEL_TRACES_EXPORTER=otlp
OTEL_METRICS_EXPORTER=otlp
OTEL_LOGS_EXPORTER=otlp

# Include authenticated user ID on all traces and logs
OTEL_USER_CONTEXT=true

# Keep errors and slow traces, drop the rest
OTEL_TRACES_TAIL_SAMPLING_ENABLED=true
OTEL_TRACES_TAIL_SAMPLING_SLOW_TRACES_THRESHOLD_MS=2000

Tail sampling is worth enabling early — it keeps every errored trace and anything slower than 2 seconds, while dropping routine fast requests. This keeps storage usage reasonable without losing the traces you actually care about.


Adding Tempo as a Data Source in Grafana

  • Go to Connections → Data Sources → Add new → Tempo
  • Set the URL to https://tempo.example.com and enable Basic Auth with your credentials
  • Under Trace to logs, select your Loki data source and add the tags service.name and host.name — this is what powers the "jump to logs" button on a trace
  • Under Trace to metrics, select your Mimir data source
  • Hit Save & Test

Dashboards

With everything connected, here's what I built in Grafana to keep an eye on things.

Slow Requests

This is the most immediately useful view — a table of every route that averages over 300ms, with a direct link to the full trace for each one.

  • Data source: Tempo
  • Visualization: Table
  • Query:
{ name =~ "GET .*|POST .*|PUT .*|DELETE .*" } | avg(duration) > 300ms

Slow requests table

Clicking the trace ID opens the full span waterfall, which makes it immediately obvious whether the time is spent in a database query, an external HTTP call, or somewhere in the application logic.

Trace detail view

Request Rate

  • Data source: Mimir
  • Visualization: Time series
  • Query:
sum(rate(traces_spanmetrics_calls_total{
  span_kind="SPAN_KIND_SERVER"
}[5m]))

traces_spanmetrics_calls_total is a counter metric automatically generated by Tempo's metrics generator. It tracks the total number of server-side spans, which lets you calculate request rate, error rate, and other RED metrics without any extra instrumentation in your app.

Request rate graph

Error Rate

  • Data source: Mimir
  • Visualization: Time series
  • Query:
sum(rate(traces_spanmetrics_calls_total{
  status_code="STATUS_CODE_ERROR",
  span_kind="SPAN_KIND_SERVER"
}[5m]))
/
sum(rate(traces_spanmetrics_calls_total{
  span_kind="SPAN_KIND_SERVER"
}[5m]))

Latency Percentiles

I added four queries on one panel — P50, P90, P95, and P99 — to get a clear picture of the tail latency.

  • Data source: Mimir
  • Visualization: Time series
  • Query (replace 0.95 with 0.99, 0.90, and 0.50 for the other series):
histogram_quantile(
  0.95,
  sum by (le) (
    rate(traces_spanmetrics_latency_bucket{
      span_kind="SPAN_KIND_SERVER"
    }[5m])
  )
)

P99 is the one to watch for intermittent issues — it's the 1% of requests that took the longest, and it often surfaces problems that only appear under specific conditions or for specific users.

Latency percentiles dashboard


Wrapping Up

With Tempo in place, the LGTM stack is complete. Metrics, logs, and traces are all flowing into Grafana and cross-linked — I can jump from a log line to its trace, or from a slow-request alert to the exact span that caused it. It took a bit of wiring, but everything in this stack plays nicely together once it's configured.

Thank you for reading, and let's connect!

Contact

Let's Connect

Whether you want to discuss a project, talk about the latest in web development, or just say hello — I'd love to hear from you.

Send me an email[email protected]