Skip to content

Troubleshooting

Common issues and resolution steps for the Oracle Cloud K3s cluster.


1. Oracle Instance Reclamation

Issue: Server is flagging for reclamation due to low CPU.

Resolution: Verify the stress-ng or urandom loop is running. See Always Free Policy.


2. Storage Issues (Longhorn)

Issue: Pods stuck in ContainerCreating with volume errors.

Resolution:

  1. Check the Longhorn UI (Port 30082).
  2. Verify if the volume is "Attached" but "Unhealthy".
  3. Restart the longhorn-manager pods if synchronization is lost.

3. Database Connectivity

Issue: Apps cannot connect to PostgreSQL.

Resolution:

  1. Check if the pg pod in the db namespace is running.
  2. Verify fsGroup: 999 is set in the deployment.
  3. Check the internal DNS: nslookup pg-svc.db.

4. Monitoring Stack Storage Migration (Longhorn to Local-Path)

Transitioning the monitoring stack from Longhorn to local-path storage to eliminate overhead.

1. Overview

Longhorn is designed for distributed block storage and introduces unnecessary I/O and CPU overhead for ephemeral time-series data on k3s. The monitoring stack is migrated to local-path-provisioner to utilize direct, host-bound directories (/var/lib/rancher/k3s/storage/). This ensures native NVMe/block speed with zero CPU overhead.

2. Implementation Procedure

A. Phase 1: Removing Immutable Resources

Kubernetes forbids mutating PersistentVolumeClaim (PVC) or volumeClaimTemplates in a StatefulSet after creation.

  1. Delete existing StatefulSets and PVCs to allow recreation with the new storageClassName.

B. Phase 2: Resolving Finalizer Deadlock

Deletion of PVCs may deadlock if the kubernetes.io/pvc-protection finalizer waits for a Longhorn detachment signal that never arrives.

  1. Remove finalizers via the API to bypass Longhorn:
    kubectl get pvc -n monitoring -o name | xargs -r -I {} kubectl patch {} -n monitoring -p "{\"metadata\":{\"finalizers\":null}}" --type=merge
    

C. Phase 3: Restarting Local-Path Provisioner

Pods may remain in Pending state if the local-path-provisioner pod loses network connectivity (dial tcp ... no route to host) during rapid resource changes.

  1. Delete the provisioner pod to force a network reset:
    kubectl delete pod -n kube-system -l app=local-path-provisioner
    
  2. Verify the provisioner reconnects and binds the pending PVCs.

3. Configuration Standards

Storage capacity is managed by the application, as local-path does not enforce filesystem limits:

  • Prometheus: Application-level retention is hard-limited to 1.5GiB or 24 hours. TSDB prunes data to maintain this limit.
  • Grafana: Configuration uses a lightweight SQLite database (<50MB).
  • Alertmanager: Stores only transient notification routing data.

4. Verification

The monitoring stack operates with zero distributed storage overhead on the Oracle VM disk.


Changelog

Date Change
Feb 13, 2026 Added Monitoring Stack Migration report.
Feb 09, 2026 Added PostgreSQL connection checks and Longhorn UI details.
Jan 20, 2026 Initial placeholder created.