Skip to main content

Audiense - SRE / Platform Engineering

(4 years 5 months)

SRE and Platform Engineering for a social intelligence platform processing large-scale social data. Enterprise clients across multiple industries. Remote-first team, trunk-based development, TDD, and pair programming as daily practice.

The Context #

The platform ingests and processes social data at a scale where infrastructure decisions matter. Pipelines need to run reliably across large volumes of records. Costs grow fast if left unchecked. Compliance requirements add constraints to every architectural choice.

The Work #

SRE #

Infrastructure and GitOps. All infrastructure managed through Terraform with GitOps workflows. Evolution from EC2 to ECS to EKS – data processing workloads on EKS for flexibility, frontend services on ECS. Autoscaling with Cluster Autoscaler, Karpenter, and KEDA depending on the workload.

Data pipelines. Infrastructure behind ETL/ELT pipelines processing large volumes of records daily. Gradual migration from Jenkins-scheduled cron jobs to event-driven Lambdas (EventBridge) and Argo Workflows for newer workloads. Distributed processing through Redpanda (Kafka-compatible, BYOC managed) and SQS (coordination). Operational tooling in Python (boto3, FastAPI, pandas) for automation and metrics analysis.

Data infrastructure. Migration to AWS managed services – search, document, relational, and analytics workloads – with zero downtime. Backup strategies with tested RPO targets.

Observability. Amazon Managed Prometheus and Grafana for metrics and dashboards, CloudWatch for logs, metrics, and alarms across the full infrastructure. Proactive alerting on infrastructure health, data quality, and pipeline throughput.

Compliance and security. GDPR compliance for European user data – data retention, privacy controls, audit trails. SOC2 Type II controls with automated evidence collection. SIEM-based security monitoring. AWS Foundational Technical Review. Collaboration with the DPO and external auditors.

FinOps. Cost and usage analysis for visibility and forecasting. Costs well-identified by service and workload – the team knows where to invest and where to cut. Commitment-based discounts and negotiated agreements. Compute optimization across container orchestration platforms. Significant savings through usage optimization and commitment management, with no loss of capacity.

Platform Engineering #

Developer self-service tooling. A Slack bot for repository scaffolding, templates, and common operations. An internal desktop tool for secrets management, database connections, environment configuration, and laptop setup – rough around the edges, but it removed bottlenecks.

Reusable CI/CD pipelines. GitHub Actions workflows covering the full CI/CD lifecycle – build, test, dependency and security scanning, deploy, release – as reusable YAML consumed directly from team repositories. Teams own their pipelines.

Infrastructure abstractions. Reusable Terraform modules that hide AWS complexity. Developers interact with simple interfaces; the modules handle the wiring underneath.

Observability self-service. Grafana for dashboards the teams build themselves, CloudWatch for infrastructure and managed service metrics, distributed tracing for application instrumentation (owned by the application teams), and error tracking.

Internal documentation. Technical tutorials, runbooks, and log analysis guides so developers can troubleshoot without escalating.

Lessons Learned #

  • Infrastructure as code is not optional at scale. Terraform and GitOps turned infrastructure changes from risky manual operations into reviewable, repeatable, reversible code.
  • Managed services free up small teams. Every service you don’t operate yourself is capacity your team spends on what actually matters.
  • Self-service is the whole point. If developers need you to deploy, connect to a database, or check a dashboard – that’s a platform gap, not a support request.
  • FinOps needs engineering, not just dashboards. Systematic cost analysis and commitment optimization cut costs more than any single architectural change.