Angga Suriana · Vourteen14

Why Auto-Upgrade is Playing Russian Roulette With Your Uptime

20 Oct 2025

The alert sound is burned into my brain now. That specific PagerDuty tone that means something is really wrong. Not “a pod restarted” wrong. Not “latency spike” wrong. The kind of wrong that makes your stomach drop before you even look at your phone.

Late Sunday night. I’d finally convinced myself to stop checking Slack every five minutes and actually relax. Big mistake.

More …

Deploy Apache APISIX on Kubernetes

16 Oct 2025

Running API Gateway in Kubernetes isn’t straightforward. Most documentation glosses over real issues like the etcd image registry problem post-VMware acquisition, CRD-based configuration patterns, and plugin troubleshooting. This guide covers deploying APISIX with local chart customization to handle these issues, implementing traffic management patterns (rate limiting, circuit breaker, caching) through Kubernetes CRDs

More …

Production-Ready Keycloak on Kubernetes with Auto-Clustering

16 Sep 2025

The Challenge

Running Keycloak in production is notoriously challenging. Session loss during scaling, complex external cache configurations, and maintaining high availability while ensuring session persistence across multiple replicas are common pain points. Traditional approaches often require external Infinispan clusters or Redis, adding operational complexity and potential failure points.

Solution Overview

Instead of managing external caching systems, we can leverage Keycloak’s built-in clustering capabilities with Kubernetes-native service discovery. This approach uses JGroups with DNS-based discovery through headless services,

More …

The Invisible Character That's Killing Your Production Deployments

01 Sep 2025

Yesterday’s daily standup was supposed to be 15 minutes. It turned into a 2-hour debugging session instead. A feature we’d already demoed to the client last week suddenly wasn’t showing up in production. The app team kept saying ‘it works fine in staging,’ while DevOps insisted ‘infrastructure looks good on our end.’ Meanwhile, the client kept asking when it would go live.

What made it frustrating was that all our monitoring was green. Database connections healthy, API response times normal, zero error rate. But somehow the new feature just wasn’t there. No error logs, no exceptions, nothing crashed.

More …

The Self-Healing Systems That Kill Themselves

24 Aug 2025

Your service is down. Again. Third time this week, and it’s only Tuesday.

The alerts started flooding in around lunch. Connection timeouts to ActiveMQ, database pool exhaustion, pods stuck in CrashLoopBackOff. Your team scrambles to investigate, but here’s the weird part: every individual component looks healthy. ActiveMQ is running fine, database performance is normal, pods would start successfully if they could just get past the connection phase.

You check the obvious suspects. Network? Fine. DNS? Working. Firewall rules? All good. Load balancer health checks? Passing. So what’s killing your perfectly healthy infrastructure?

More …