Audiense - SRE / Platform Engineering

SRE and Platform Engineering for a social intelligence platform processing large-scale social data across major social networks. Enterprise clients across multiple industries.

Remote-first team with a strong distributed culture. TDD, trunk-based development, and pair programming as daily practice.

The Context #

The platform ingests and processes social data at a scale where every infrastructure decision has compounding consequences. Pipelines need to run reliably across large volumes of records. Costs grow fast if left unchecked. Compliance requirements add constraints to every architectural choice. And the data keeps growing.

What I Worked On #

SRE #

Infrastructure and GitOps. All infrastructure changes through Terraform with GitOps workflows. Multi-region AWS architecture and failover strategies. Automated systems that used to require manual intervention – deployments, scaling, incident response.

Data pipelines. Infrastructure behind ETL/ELT pipelines processing large volumes of records daily. Workflow orchestration replacing fragile cron jobs. Distributed processing through message queues. Operational tooling in Python (boto3, FastAPI, pandas) for automation and metrics analysis.

Search infrastructure. Search clusters for data engineers building semantic queries across massive datasets. Keeping them fast and stable under production load.

Databases. Migration to managed services with zero downtime. Document database clusters optimized for social graph queries. Specialized indexes for advanced query patterns. Caching clusters for real-time performance. Backup strategies with aggressive RPO targets.

Observability. Prometheus and Grafana monitoring with proactive alerting. Tracked infrastructure health, data quality metrics, and pipeline throughput. Cloud cost as an operational metric, not an afterthought.

Compliance and security. GDPR compliance for European user data – data retention, privacy controls, audit trails. SOC2 Type II controls and evidence collection. Cloud provider compliance reviews. Worked directly with the DPO and external auditors.

FinOps. Cost and usage analysis as the primary tool for visibility and forecasting. Commitment-based discounts and negotiated agreements. Compute optimization across container orchestration platforms. The result: significant cost reduction without sacrificing performance or reliability.

Platform Engineering #

Developer self-service tooling. A Slack bot (sysbot) for repository scaffolding, templates, and common operations – developers don’t wait on anyone. An internal desktop tool for secrets management, database connections, environment configuration, and laptop setup – rough around the edges, but easy to extend and it removed bottlenecks.

Reusable CI/CD pipelines. GitHub Actions workflows covering the full CI/CD lifecycle – build, test, security scanning, deploy, release – as reusable YAML consumed directly from team repositories. Teams own their pipelines.

Infrastructure abstractions. Reusable Terraform modules that hide AWS complexity. Developers interact with simple interfaces; the modules handle the wiring. Goal: nobody needs to understand VPCs to ship a feature.

Observability self-service. Grafana for application dashboards the teams build themselves, CloudWatch for infrastructure and managed service metrics, Rollbar for application-level error tracking. Teams own their own visibility.

Internal documentation. Technical tutorials, runbooks, and log analysis guides so developers can troubleshoot without escalating. The best platform work is the support ticket that never gets filed.

Lessons Learned #

  1. Infrastructure as code is not optional at scale. Terraform and GitOps turned infrastructure changes from risky manual operations into reviewable, repeatable, reversible code. Without it, a platform this size drifts into chaos.
  2. Managed services let small teams punch above their weight. Every service you don’t operate yourself is capacity your team spends on what actually matters. The migration cost pays for itself in operational headroom.
  3. The best platform work makes you unnecessary. If developers need you to deploy, connect to a database, or check a dashboard – that’s a platform failure. Self-service isn’t a feature, it’s the whole point.
  4. FinOps needs engineering, not just dashboards. Systematic cost analysis and commitment optimization cut costs more than any single architectural change.