AI Engineering Lead

Deuna

Deuna

Software Engineering, Data Science

San Francisco, CA, USA

Posted on May 23, 2026

About the Role

Athia is DEUNA's AI-powered payment intelligence platform — moving from early ML experimentation to the critical infrastructure behind billions of dollars in annual transaction volume. We are looking for a hands-on Engineering Lead who can own the full technical stack: from model development and data pipelines to production payment orchestration, cloud/on-prem deployments, and real-time observability.

This is not a coordination role. You will build, ship, and own. You will be the technical authority that bridges AI/ML systems with our core payments stack, leading both the platform engineering and the modeling lifecycle end-to-end.

    Core Responsibilities

    1 · AI/ML Model Ownership

    • Design, train, and fine-tune ML models for payment optimization use cases — including authorization rate improvement, dynamic routing, cost minimization, and fraud signal detection.

    • Select and apply the right frameworks (PyTorch, TensorFlow, scikit-learn) per model type and latency budget.

    • Own the model lifecycle: experimentation → offline evaluation → shadow deployment → A/B testing → production promotion.

    • Monitor and remediate model drift, data distribution shifts, and performance degradation proactively.

    • Define evaluation metrics that map directly to business KPIs (approval rate lift, GMV impact, provider cost).

    • 2 · Data Pipelines & Feature Engineering

      • Architect and build optimized data pipelines to collect, clean, and preprocess high-volume transaction data for model training and inference.

      • Design feature stores and real-time feature serving layers that keep inference latency within payments SLA requirements (<100 ms).

      • Establish data quality standards, schema validation, and lineage tracking across the ML data stack.

      • Partner with the Data Engineering team to ensure training data reflects the full distribution of providers, regions, and merchant types in our network.

      • 3 · Production Deployment & Payments Stack Integration

        • Integrate ML model outputs into DEUNA's live payment routing and orchestration layer with zero tolerance for latency regressions or silent errors.

        • Develop and own the inference service layer in Go and Python, ensuring thread-safe, performant, and fault-tolerant operation under peak transaction load.

        • Lead the design of hybrid deployment architectures: cloud-native (AWS/GCP) and on-premise client environments, including secure bi-directional data synchronization.

        • Build and maintain RESTful and gRPC APIs that expose Athia capabilities to the broader DEUNA platform and external partners.

        • 4 · Observability, Monitoring & Incident Response

          • Own the full observability stack for Athia: real-time dashboards, alerting thresholds, anomaly detection, and post-incident reviews.

          • Implement model-specific monitoring (prediction distributions, confidence scores, provider error rates) alongside standard infrastructure metrics.

          • Create a fast feedback loop with the Operations team to detect and remediate routing degradation or GMV impact within SLA.

          • Define on-call runbooks and escalation paths that are clear, tested, and kept up to date.

          • 5 · Scalability, Resiliency & Engineering Leadership

            • Provide architectural guidance to scale Athia to handle 10M+ monthly transactions across concurrent global partner launches.

            • Lead and mentor engineers through architecture reviews, code reviews, technical planning, and day-to-day execution.

            • Drive engineering best practices: testing strategy (unit, integration, shadow), CI/CD pipelines, documentation standards, and security compliance.

            • Translate business and product goals into concrete technical roadmaps with realistic timelines and clear dependency mapping.

            • Requirements

              Backend & Infrastructure

              • Go (Golang) — production-grade services

              • Python — ML pipelines, model serving, tooling

              • RESTful APIs and gRPC

              • Distributed systems & event-driven arch

              • CI/CD, Docker, Kubernetes

              • Cloud platforms (AWS or GCP)

              • Hybrid / on-prem deployment patterns

              AI / ML Stack

              • PyTorch or TensorFlow — training & fine-tuning

              • scikit-learn, XGBoost, or tabular ML

              • MLflow, Weights & Biases, or equivalent

              • Feature engineering & feature stores

              • Model monitoring & drift detection

              • A/B testing and shadow deployment

              • Low-latency inference architectures

              Frontend & Full-Stack

              • React and Next.js

              • TypeScript

              • Component design systems

              • API integration patterns

              Observability & Data

              • Prometheus, Grafana, or Datadog

              • Structured logging & distributed tracing

              • SQL and analytical query patterns

              • Data pipeline tooling (Airflow, dbt, etc.)

              Experience

              • 6+ years in software engineering with strong backend foundations.

              • 2+ years in a Tech Lead or Staff Engineer role owning a production platform end-to-end.

              • Demonstrated experience shipping ML/AI systems to production — not just research or notebooks.

              • Background in payments, fintech, or high-transaction environments strongly preferred.

              • Experience with on-premise deployment or hybrid infrastructure for enterprise clients is a plus.

              • Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.