Part 4: The Jira-to-Code Workflow

Now that we have our memory centre, our operational mindsets, and our engineering standards, how do we actually execute a complex task without the AI losing its way?

In a typical AI workflow, a developer might copy-paste a Jira ticket and ask, "Code this." The result is often a hallucinated guess that misses business logic or ignores repository patterns and standards. To solve this, we move to a Context-Driven workflow where the AI processes requirements through a structured workflow.

The Input: A Jira ticket

To test this process, use a standard, slightly under-specified Jira ticket. If you're following along with this research, you can use this sample:

ai/tickets/PAY-402.md

Jira Ticket: PAY-402 | Partial Refund Support

Description:

We need to allow customers to request partial refunds on successful payments.

Requirements:

  - Logic: Create a service method to handle partial refund logic.

  - Constraint: Total refunds cannot exceed the original amount_captured.

  - Observability: Emit an audit event / log with trace_id, payment_intent_id, refund_id, amount, and outcome (no PII).

  - Safety: Ensure the process is idempotent to prevent double-refunding.

Visualising the Workflow

This is the transition from a "Chat" to a "Workflow". Instead of a single prompt, we treat the requirement as an input to a structured process.

graph LR
    %% Input
    Ticket((Jira Ticket)) -- "1. Request" --> Arch

    %% Phase 1
    subgraph Phase1 ["Phase 1: Planning"]
        Arch["<b>Architect Role</b><br/>(Audit & Plan)"]
    end

    %% Handover
    Arch -- "2. Proposal & Diagrams" --> Dev

    %% Phase 2
    subgraph Phase2 ["Phase 2: Execution"]
        Dev["<b>Developer Role</b><br/>(Code & Patch)"]
    end

    %% Output
    Dev -- "3. Writes" --> Code[("📦 src/<br/>(Code)")]

    %% Knowledge Base
    subgraph Repo ["The Source of Truth"]
        Memory["📂 ai/context/"]
        Standards["📂 ai/standards/"]
    end

    %% Grounding Lines
    Memory -.->|Grounding| Arch
    Standards -.->|Enforce| Dev

    %% Context Patch (feedback loop)
    Dev -- "4. Context Patch" --> Memory

    %% Styling
    style Ticket fill:#f5f5f5,stroke:#9e9e9e
    style Repo fill:#f0f7ff,stroke:#0052cc,stroke-dasharray: 5 5
    style Phase1 fill:#fff,stroke:#01579b
    style Phase2 fill:#fff,stroke:#2e7d32
    style Code fill:#fff7e6,stroke:#ffa940

The Architect’s Blueprint

Instead of jumping to code, we first invoke the architect role. Its job is to audit the ticket against our ai/context/ files and identify exactly how this should be built.

The Workflow Sequence

sequenceDiagram
    participant User as Jira Ticket
    participant Arch as  Architect Role
    participant Context as ai/context
    participant Dev as Developer Role
    participant PR as Production PR

    User->>Arch: Feed Requirement
    Context-->>Arch: Provide Invariants & Standards
    Note over Arch: Validates against Business Logic
    Arch->>Dev: Technical Blueprint (Markdown)
    Context-->>Dev: Provide Style & Safety Rules
    Note over Dev: Writes Standard-Aligned Code
    Dev->>PR: Generates Implementation

The Architectural Review in Action

With the process established, the architect performs the actual audit. This is where the Stateless AI becomes a Context-Aware Architect.

User

Architect, review PAY-402. Output a plan for the Developer.

Architectural Review: PAY-402 — Partial Refund Support

1. Problem Statement

PAY-402 requests a service method for partial refunds with three invariants: amount cap, observability, and idempotency.

Cross-referencing the codebase against DOMAIN_MODEL.md and ENGINEERING_STANDARDS.md reveals that the core implementation already exists in PaymentService.create_refund().

However, the review surfaced one critical design flaw and two gaps.

2. Context Validation — Invariant Checklist

DOMAIN_MODEL Invariant	Status	Location
Refund cannot exceed `amount_captured`	Pass	`services.py#L46-L55` — aggregate of non-failed refunds checked
Refund must have unique `trace_id` before Stripe call	Pass	`services.py#L57` — `uuid4()` generated pre-call
SUM of non-failed refunds ≤ `amount_captured`	Pass	`services.py#L46-L49` — `.exclude(status=FAILED)`
PaymentIntent must be `Succeeded` to allow refund	Pass	`services.py#L40-L43`
`Succeeded` is terminal and cannot revert	Pass	No code path mutates PI status on refund

ENGINEERING_STANDARDS Rule	Status	Location
Business logic in `services.py`	Pass	All logic in `PaymentService`
`@transaction.atomic` for multi-model updates	Flagged	See Finding #1 below
OpenTelemetry span per service method	Pass	`services.py#L25`
`trace_id` in logs	Pass	All `logger.info/error` calls include it
Never log PII	Pass	No PAN/CVV/secrets logged

3. Findings

Finding 1 — CRITICAL: Stripe HTTP call inside transaction.atomic()

The Stripe Refund.create() call at services.py#L77 executes inside the transaction.atomic() block that holds a SELECT FOR UPDATE row lock on the PaymentIntent.

Risks:

DB connection exhaustion: Row lock held for the full Stripe round-trip (200ms–30s). Under load, this starves the connection pool.
Phantom refund on timeout: If the HTTP call times out, StripeError is raised, the DB transaction rolls back (Refund row deleted), but Stripe may have actually processed the refund. On the next user retry, a new trace_id is generated → the idempotency key offers no protection → double refund.

Required fix:

Two-phase approach — create the Refund row and release the lock first, then call Stripe outside the transaction, then update status in a second atomic block.

Finding 2 — GAP: Webhook handler incomplete

Per IMPLEMENTATION_STATE.md, the webhook handler is ~50% complete. The view at views.py#L78 only listens for charge.refund.updated but not charge.refunded (the initial refund success event from Stripe). This means refunds that succeed synchronously may never transition from PENDING to SUCCEEDED in our system.

Finding 3 — GAP: No concurrent-request test

Tests validate sequential partial refunds but no test exercises two simultaneous refund requests racing for the same PaymentIntent, which is the scenario the select_for_update() lock is meant to protect.

4. Visual Logic — Proposed Two-Phase Refund Flow

sequenceDiagram
    participant V as View
    participant S as PaymentService
    participant DB as PostgreSQL
    participant St as Stripe API

    V->>S: create_refund(pi_id, amount)

    rect rgb(230, 245, 255)
        note right of S: Phase 1 — Lock, validate, persist
        S->>DB: BEGIN + SELECT FOR UPDATE (PaymentIntent)
        DB-->>S: PaymentIntent row (locked)
        S->>DB: SUM(non-failed refunds)
        DB-->>S: already_refunded
        S->>S: Validate amount cap
        S->>DB: INSERT Refund (status=CREATED)
        S->>DB: COMMIT (lock released)
    end

    rect rgb(255, 245, 230)
        note right of S: Phase 2 — External call (no lock held)
        S->>St: Refund.create(idempotency_key=trace_id)
        St-->>S: stripe_refund_id
    end

    rect rgb(230, 255, 230)
        note right of S: Phase 3 — Update status
        S->>DB: BEGIN + SELECT FOR UPDATE (Refund)
        S->>DB: UPDATE status=PENDING, stripe_refund_id
        S->>DB: COMMIT
    end

    S-->>V: Refund object

Note: In practice, the first pass won’t always catch everything — the win is that the workflow forces an audit against your invariants before any code ships.

5. Developer Plan

The Developer should execute these tasks in order:

Step	Task	Files	Acceptance Criteria
1	Refactor `create_refund` to two-phase	`services.py`	Stripe call executes outside `transaction.atomic()`. On Stripe failure, Refund row transitions to `FAILED` instead of being deleted. Row lock duration ≤ query time only.
2	Handle Stripe timeout / ambiguous failure	`services.py`	If `StripeError` is a connection/timeout error, mark Refund as `PENDING` (not `FAILED`) and enqueue a Celery reconciliation task.
3	Add `charge.refunded` to webhook handler	`views.py`	`handled_event_types` includes `charge.refunded`. Webhook correctly routes to `process_refund_webhook`.
4	Add concurrent refund test	`tests.py`	Test uses threading or `select_for_update` simulation to prove the amount cap holds under concurrent requests.
5	Update existing tests for two-phase	`tests.py`	`test_stripe_error_rolls_back_refund` updated: Refund row now persists with `FAILED` status instead of being deleted. `test_stripe_rollback_leaves_full_balance_available` updated to match.
6	Context Patch	See below	Update `ai/context/IMPLEMENTATION_STATE.md`

6. Context Patch

ai/context/IMPLEMENTATION_STATE.md — proposed update after Developer completes work:

# Implementation State: Refund Feature

## Current Status
- Service Layer: PaymentService.create_refund() refactored to two-phase (lock → call → update)
- Data Layer: Refund model migrated and active
- Webhooks: `charge.refunded` + `charge.refund.updated` both handled
- Idempotency: trace_id used as Stripe idempotency_key; timeout paths enqueue reconciliation

## Resolved
- PAY-402: Partial refund support (amount cap, observability, idempotency)
- Stripe call moved outside transaction.atomic() to prevent lock contention

## Next Steps
1. Implement Celery reconciliation task for ambiguous Stripe timeouts
2. Add integration tests with Stripe sandbox (blocked on QA API keys)
3. Update observability spans for webhook failures

Operational Constraint

Per the Architect role: no implementation code is produced at this stage. The Developer should take this plan and implement against the DEVELOPER.md role constraints, adhering to STYLE_GUIDE.md and BOUNDARIES.md.

The Developer’s Implementation

Now, the developer role takes over. It isn't working from a vague ticket anymore; it’s working from a high-fidelity blueprint that already respects your business logic.

User

Developer, implement the plan for PAY-402.

PAY-402 Implementation Summary

Files Modified

payments/services.py — Two-Phase Refund

Refactored create_refund to eliminate the critical flaw of holding a DB row lock during the Stripe HTTP call.

Phase	Scope	What Happens
1 — Lock & Validate	`transaction.atomic()`	`SELECT FOR UPDATE` on PaymentIntent, validate amount cap, `INSERT Refund(status=CREATED)`, commit (lock released)
2 — External Call	No lock held	`stripe.Refund.create()` with `trace_id` as idempotency key
3 — Update Status	`transaction.atomic()`	`SELECT FOR UPDATE` on Refund, set `status=PENDING` + `stripe_refund_id`, commit

Error handling:

APIConnectionError (ambiguous) → Refund stays CREATED, enqueues reconcile_refund_with_stripe Celery task.
Other StripeError (definitive) → Refund marked FAILED (persisted, not deleted), raises RefundGatewayError.

payments/tasks.py — Reconciliation Task

New reconcile_refund_with_stripe task (max 5 retries, 120s delay):

Re-issues the Stripe call using the same idempotency key (trace_id), so Stripe deduplicates.
Connection error → retries.
Definitive Stripe error → marks refund FAILED.
Success → maps Stripe's status to internal status and persists.

payments/views.py — Webhook Gap Closed

Added "charge.refunded" to handled_event_types alongside "charge.refund.updated", ensuring refunds that succeed synchronously on Stripe's side are captured by our webhook handler.

payments/tests.py — Updated & New Tests

Updated tests:

Test	Change
`test_stripe_error_marks_refund_as_failed`	Asserts refund persists with `FAILED` status (previously asserted deletion)
`test_failed_stripe_call_leaves_full_balance_available`	Confirms `FAILED` refund exists and full balance remains available for retry

New test classes:

Class	Coverage
`AmbiguousFailureTests`	`APIConnectionError` enqueues reconciliation task; ambiguous refund counts toward amount cap
`ConcurrentRefundTests` (`TransactionTestCase`)	Two threads race to refund 6000 against a 10000 cap — exactly one succeeds, one raises `RefundAmountExceedsCaptured`

ai/context/IMPLEMENTATION_STATE.md — Context Patch

Updated to reflect resolved status of PAY-402, two-phase architecture, and completed webhook coverage.

Because the developer has the standards in its context, it automatically:

Wraps the logic in @transaction.atomic.
Initialises an OpenTelemetry span.
Places the logic in services.py, keeping the architecture standard-aligned.

The "Save Game" Effect: Managing State over Time

Engineering rarely happens in a single, uninterrupted flow. A complex feature might span three days and fifty context switches. By using the implementation state as a mandatory output, we create a contextual heartbeat.

How the sync works in practice:

The Persistent Patch: The architect formalises the findings in ai/context/IMPLEMENTATION_STATE.md.
The Handover: When you return to the task days later, you simply feed the ai/ directory back to the AI.
Continuity: The architect reads the state file, sees that the "Stripe lock" was flagged as a resolved risk, and ensures the developer role maintains that specific two-phase logic.

ai/context/IMPLEMENTATION_STATE.md

# Implementation State: Refund Feature

...

## Resolved
- PAY-402: Partial refund support (amount cap, observability, idempotency).
- Refactored: Stripe call moved OUTSIDE transaction.atomic() to prevent lock contention (Found during Audit).

...

The pipeline picks up exactly where it left off. The AI's memory is no longer tied to a chat window; it is tied to the repository.

The Result: A Standard-Aligned PR

The final output isn’t just working code — it’s code that looks like it was written by a senior software engineer. It respects the business invariants and technical standards by default, freeing the developer from having to manually police the same architectural rules for every sub-task.

The Jira-to-Code workflow turns the AI from an unpredictable chatbot into a reliable compiler for requirements. It ensures that the Architecture and the Development are separated, leading to fewer bugs and significantly less architectural debt.

Join the Conversation

I’m sharing this research as a series on LinkedIn to gather feedback:

How much time does your team spend fixing AI-generated code that ignored your standards?
Could a blueprint step help catch logic errors earlier in your workflow?
What’s your team’s equivalent of “Save Game” for long-running work — and does it survive tool changes?

Next Step: Patching and Persistence

We have the engine running and the factory producing code. But how do we stop the system from context decay as it grows? In our next post, we’ll look at patching and persistence and how a senior engineer uses context patches to ensure the AI's memory evolves alongside the codebase, creating a self-updating source of truth.