Now that we have our memory centre, our operational mindsets, and our engineering standards, how do we actually execute a complex task without the AI losing its way?
In a typical AI workflow, a developer might copy-paste a Jira ticket and ask, "Code this." The result is often a hallucinated guess that misses business logic or ignores repository patterns and standards. To solve this, we move to a Context-Driven workflow where the AI processes requirements through a structured workflow.
The Input: A Jira ticket
To test this process, use a standard, slightly under-specified Jira ticket. If you're following along with this research, you can use this sample:
Visualising the Workflow
This is the transition from a "Chat" to a "Workflow". Instead of a single prompt, we treat the requirement as an input to a structured process.
graph LR
%% Input
Ticket((Jira Ticket)) -- "1. Request" --> Arch
%% Phase 1
subgraph Phase1 ["Phase 1: Planning"]
Arch["<b>Architect Role</b><br/>(Audit & Plan)"]
end
%% Handover
Arch -- "2. Proposal & Diagrams" --> Dev
%% Phase 2
subgraph Phase2 ["Phase 2: Execution"]
Dev["<b>Developer Role</b><br/>(Code & Patch)"]
end
%% Output
Dev -- "3. Writes" --> Code[("📦 src/<br/>(Code)")]
%% Knowledge Base
subgraph Repo ["The Source of Truth"]
Memory["📂 ai/context/"]
Standards["📂 ai/standards/"]
end
%% Grounding Lines
Memory -.->|Grounding| Arch
Standards -.->|Enforce| Dev
%% Context Patch (feedback loop)
Dev -- "4. Context Patch" --> Memory
%% Styling
style Ticket fill:#f5f5f5,stroke:#9e9e9e
style Repo fill:#f0f7ff,stroke:#0052cc,stroke-dasharray: 5 5
style Phase1 fill:#fff,stroke:#01579b
style Phase2 fill:#fff,stroke:#2e7d32
style Code fill:#fff7e6,stroke:#ffa940
The Architect’s Blueprint
Instead of jumping to code, we first invoke the architect role. Its job is to audit the ticket against our ai/context/ files and identify exactly how this should be built.
The Workflow Sequence
sequenceDiagram
participant User as Jira Ticket
participant Arch as Architect Role
participant Context as ai/context
participant Dev as Developer Role
participant PR as Production PR
User->>Arch: Feed Requirement
Context-->>Arch: Provide Invariants & Standards
Note over Arch: Validates against Business Logic
Arch->>Dev: Technical Blueprint (Markdown)
Context-->>Dev: Provide Style & Safety Rules
Note over Dev: Writes Standard-Aligned Code
Dev->>PR: Generates Implementation
The Architectural Review in Action
With the process established, the architect performs the actual audit. This is where the Stateless AI becomes a Context-Aware Architect.
User
Architect, review PAY-402. Output a plan for the Developer.
AI
Architectural Review: PAY-402 — Partial Refund Support
1. Problem Statement
PAY-402 requests a service method for partial refunds with three invariants: amount cap, observability, and idempotency.
Cross-referencing the codebase against DOMAIN_MODEL.md and ENGINEERING_STANDARDS.md reveals that the core implementation already exists in PaymentService.create_refund().
However, the review surfaced one critical design flaw and two gaps.
2. Context Validation — Invariant Checklist
| DOMAIN_MODEL Invariant | Status | Location |
|---|---|---|
Refund cannot exceed amount_captured |
Pass | services.py#L46-L55 — aggregate of non-failed refunds checked |
Refund must have unique trace_id before Stripe call |
Pass | services.py#L57 — uuid4() generated pre-call |
SUM of non-failed refunds ≤ amount_captured |
Pass | services.py#L46-L49 — .exclude(status=FAILED) |
PaymentIntent must be Succeeded to allow refund |
Pass | services.py#L40-L43 |
Succeeded is terminal and cannot revert |
Pass | No code path mutates PI status on refund |
| ENGINEERING_STANDARDS Rule | Status | Location |
|---|---|---|
Business logic in services.py |
Pass | All logic in PaymentService |
@transaction.atomic for multi-model updates |
Flagged | See Finding #1 below |
| OpenTelemetry span per service method | Pass | services.py#L25 |
trace_id in logs |
Pass | All logger.info/error calls include it |
| Never log PII | Pass | No PAN/CVV/secrets logged |
3. Findings
Finding 1 — CRITICAL: Stripe HTTP call inside transaction.atomic()
The Stripe Refund.create() call at services.py#L77 executes inside the transaction.atomic() block that holds a SELECT FOR UPDATE row lock on the PaymentIntent.
Risks:
-
DB connection exhaustion: Row lock held for the full Stripe round-trip (200ms–30s). Under load, this starves the connection pool.
-
Phantom refund on timeout: If the HTTP call times out,
StripeErroris raised, the DB transaction rolls back (Refund row deleted), but Stripe may have actually processed the refund. On the next user retry, a newtrace_idis generated → the idempotency key offers no protection → double refund.
Required fix:
Two-phase approach — create the Refund row and release the lock first, then call Stripe outside the transaction, then update status in a second atomic block.
Finding 2 — GAP: Webhook handler incomplete
Per IMPLEMENTATION_STATE.md, the webhook handler is ~50% complete. The view at views.py#L78 only listens for charge.refund.updated but not charge.refunded (the initial refund success event from Stripe). This means refunds that succeed synchronously may never transition from PENDING to SUCCEEDED in our system.
Finding 3 — GAP: No concurrent-request test
Tests validate sequential partial refunds but no test exercises two simultaneous refund requests racing for the same PaymentIntent, which is the scenario the select_for_update() lock is meant to protect.
4. Visual Logic — Proposed Two-Phase Refund Flow
sequenceDiagram
participant V as View
participant S as PaymentService
participant DB as PostgreSQL
participant St as Stripe API
V->>S: create_refund(pi_id, amount)
rect rgb(230, 245, 255)
note right of S: Phase 1 — Lock, validate, persist
S->>DB: BEGIN + SELECT FOR UPDATE (PaymentIntent)
DB-->>S: PaymentIntent row (locked)
S->>DB: SUM(non-failed refunds)
DB-->>S: already_refunded
S->>S: Validate amount cap
S->>DB: INSERT Refund (status=CREATED)
S->>DB: COMMIT (lock released)
end
rect rgb(255, 245, 230)
note right of S: Phase 2 — External call (no lock held)
S->>St: Refund.create(idempotency_key=trace_id)
St-->>S: stripe_refund_id
end
rect rgb(230, 255, 230)
note right of S: Phase 3 — Update status
S->>DB: BEGIN + SELECT FOR UPDATE (Refund)
S->>DB: UPDATE status=PENDING, stripe_refund_id
S->>DB: COMMIT
end
S-->>V: Refund object
Note: In practice, the first pass won’t always catch everything — the win is that the workflow forces an audit against your invariants before any code ships.
5. Developer Plan
The Developer should execute these tasks in order:
| Step | Task | Files | Acceptance Criteria |
|---|---|---|---|
| 1 | Refactor create_refund to two-phase |
services.py |
Stripe call executes outside transaction.atomic(). On Stripe failure, Refund row transitions to FAILED instead of being deleted. Row lock duration ≤ query time only. |
| 2 | Handle Stripe timeout / ambiguous failure | services.py |
If StripeError is a connection/timeout error, mark Refund as PENDING (not FAILED) and enqueue a Celery reconciliation task. |
| 3 | Add charge.refunded to webhook handler |
views.py |
handled_event_types includes charge.refunded. Webhook correctly routes to process_refund_webhook. |
| 4 | Add concurrent refund test | tests.py |
Test uses threading or select_for_update simulation to prove the amount cap holds under concurrent requests. |
| 5 | Update existing tests for two-phase | tests.py |
test_stripe_error_rolls_back_refund updated: Refund row now persists with FAILED status instead of being deleted. test_stripe_rollback_leaves_full_balance_available updated to match. |
| 6 | Context Patch | See below | Update ai/context/IMPLEMENTATION_STATE.md |
6. Context Patch
ai/context/IMPLEMENTATION_STATE.md — proposed update after Developer completes work:
# Implementation State: Refund Feature
## Current Status
- Service Layer: PaymentService.create_refund() refactored to two-phase (lock → call → update)
- Data Layer: Refund model migrated and active
- Webhooks: `charge.refunded` + `charge.refund.updated` both handled
- Idempotency: trace_id used as Stripe idempotency_key; timeout paths enqueue reconciliation
## Resolved
- PAY-402: Partial refund support (amount cap, observability, idempotency)
- Stripe call moved outside transaction.atomic() to prevent lock contention
## Next Steps
1. Implement Celery reconciliation task for ambiguous Stripe timeouts
2. Add integration tests with Stripe sandbox (blocked on QA API keys)
3. Update observability spans for webhook failures
Operational Constraint
Per the Architect role: no implementation code is produced at this stage. The Developer should take this plan and implement against the DEVELOPER.md role constraints, adhering to STYLE_GUIDE.md and BOUNDARIES.md.
The Developer’s Implementation
Now, the developer role takes over. It isn't working from a vague ticket anymore; it’s working from a high-fidelity blueprint that already respects your business logic.
User
Developer, implement the plan for PAY-402.
AI
PAY-402 Implementation Summary
Files Modified
payments/services.py — Two-Phase Refund
Refactored create_refund to eliminate the critical flaw of holding a DB row lock during the Stripe HTTP call.
| Phase | Scope | What Happens |
|---|---|---|
| 1 — Lock & Validate | transaction.atomic() |
SELECT FOR UPDATE on PaymentIntent, validate amount cap, INSERT Refund(status=CREATED), commit (lock released) |
| 2 — External Call | No lock held | stripe.Refund.create() with trace_id as idempotency key |
| 3 — Update Status | transaction.atomic() |
SELECT FOR UPDATE on Refund, set status=PENDING + stripe_refund_id, commit |
Error handling:
APIConnectionError(ambiguous) → Refund staysCREATED, enqueuesreconcile_refund_with_stripeCelery task.- Other
StripeError(definitive) → Refund markedFAILED(persisted, not deleted), raisesRefundGatewayError.
payments/tasks.py — Reconciliation Task
New reconcile_refund_with_stripe task (max 5 retries, 120s delay):
- Re-issues the Stripe call using the same idempotency key (
trace_id), so Stripe deduplicates. - Connection error → retries.
- Definitive Stripe error → marks refund
FAILED. - Success → maps Stripe's status to internal status and persists.
payments/views.py — Webhook Gap Closed
Added "charge.refunded" to handled_event_types alongside "charge.refund.updated", ensuring refunds that succeed synchronously on Stripe's side are captured by our webhook handler.
payments/tests.py — Updated & New Tests
Updated tests:
| Test | Change |
|---|---|
test_stripe_error_marks_refund_as_failed |
Asserts refund persists with FAILED status (previously asserted deletion) |
test_failed_stripe_call_leaves_full_balance_available |
Confirms FAILED refund exists and full balance remains available for retry |
New test classes:
| Class | Coverage |
|---|---|
AmbiguousFailureTests |
APIConnectionError enqueues reconciliation task; ambiguous refund counts toward amount cap |
ConcurrentRefundTests (TransactionTestCase) |
Two threads race to refund 6000 against a 10000 cap — exactly one succeeds, one raises RefundAmountExceedsCaptured |
ai/context/IMPLEMENTATION_STATE.md— Context Patch
Updated to reflect resolved status of PAY-402, two-phase architecture, and completed webhook coverage.
Because the developer has the standards in its context, it automatically:
-
Wraps the logic in @transaction.atomic.
-
Initialises an OpenTelemetry span.
-
Places the logic in
services.py, keeping the architecture standard-aligned.
The "Save Game" Effect: Managing State over Time
Engineering rarely happens in a single, uninterrupted flow. A complex feature might span three days and fifty context switches. By using the implementation state as a mandatory output, we create a contextual heartbeat.
How the sync works in practice:
-
The Persistent Patch: The architect formalises the findings in
ai/context/IMPLEMENTATION_STATE.md. -
The Handover: When you return to the task days later, you simply feed the
ai/directory back to the AI. -
Continuity: The architect reads the state file, sees that the "Stripe lock" was flagged as a resolved risk, and ensures the developer role maintains that specific two-phase logic.
The pipeline picks up exactly where it left off. The AI's memory is no longer tied to a chat window; it is tied to the repository.
The Result: A Standard-Aligned PR
The final output isn’t just working code — it’s code that looks like it was written by a senior software engineer. It respects the business invariants and technical standards by default, freeing the developer from having to manually police the same architectural rules for every sub-task.
The Jira-to-Code workflow turns the AI from an unpredictable chatbot into a reliable compiler for requirements. It ensures that the Architecture and the Development are separated, leading to fewer bugs and significantly less architectural debt.
Join the Conversation
I’m sharing this research as a series on LinkedIn to gather feedback:
-
How much time does your team spend fixing AI-generated code that ignored your standards?
-
Could a blueprint step help catch logic errors earlier in your workflow?
-
What’s your team’s equivalent of “Save Game” for long-running work — and does it survive tool changes?
Next Step: Patching and Persistence
We have the engine running and the factory producing code. But how do we stop the system from context decay as it grows? In our next post, we’ll look at patching and persistence and how a senior engineer uses context patches to ensure the AI's memory evolves alongside the codebase, creating a self-updating source of truth.