AML Shield
An AI-powered anti-money laundering compliance platform built for BaFin-regulated German financial institutions. A conditional ReAct agent (Reason + Act) combines Claude with an XGBoost classifier to deliver explainable transaction risk decisions grounded in live regulatory citations — from German GwG to FATF Recommendations to EU 6AMLD — and produces BaFin-format SAR reports with full, immutable audit trails.
Legal Grounding
Every decision AML Shield produces is anchored to a specific statutory obligation. The platform embeds regulatory citations directly into the agent's reasoning chain rather than treating compliance as a post-hoc annotation. The primary framework is German law (GwG), supplemented by FATF standards and EU directives where they impose a higher obligation.
| Instrument | Provision | How it is applied in AML Shield |
|---|---|---|
| GwG §10 Abs. 3 | Customer due diligence for transactions ≥ €10,000 | Triggers is_above_ctr feature in the ML model; regulatory checker raises a MEDIUM-severity flag |
| GwG §43 Abs. 1 | Obligation to report suspicious transactions to the BaFin FIU | The primary trigger for SAR_REQUIRED decisions; report is submitted via the goAML portal |
| GwG §47 Abs. 5 | Tipping-off prohibition — informing a customer of a SAR is a criminal offence | Explicit warning printed on every SAR output; UI prevents any customer-facing message referencing the case |
| GwG §17 | Criminal liability for under-reporting (Strafbarkeit) | Justifies the conservative default bias in the system prompt: "when uncertain between tiers, escalate higher" |
| FATF Rec. 16 | Wire transfer transparency — originator and beneficiary info must accompany cross-border payments | Missing/incomplete IBAN or counterparty data raises the risk score and triggers a regulatory flag |
| FATF Rec. 19 | Enhanced due diligence for transactions involving blacklisted jurisdictions | is_high_risk_country feature; Rule E: sanctions fast-track overrides all other processing |
| FATF Rec. 20 | Suspicious transactions must be reported regardless of amount | Overrides the €10,000 threshold — suspicious patterns below CTR are still escalated |
| EU 6AMLD Art. 18 | Expanded predicate offences and enhanced corporate liability | SAR narrative includes 6AMLD transposition language when predicate offence indicators are present |
| EU Reg. 2015/847 | Wire Transfer Regulation — information accompanying transfers of funds | Cross-border payment scrutiny layer; flags transfers lacking compliant originator records |
| EU MiCA 2023/1114 | Crypto-asset service provider obligations | Applied by the regulatory checker when transaction_type = crypto_exchange |
Money Laundering Patterns
The rule engine and ML features jointly encode four primary typologies. Each maps to one or more statutory obligations.
Structuring (Smurfing)
Transactions in the €8,500–€9,999 band are flagged by the is_near_ctr_threshold feature — deliberate positioning just below the €10,000 Cash Transaction Reporting threshold to avoid GwG §10 Abs. 3 scrutiny. This is the second-highest SHAP contributor in the reference case (0.614).
High-Risk Jurisdiction Exposure
Country codes are extracted from BIC/IBAN and matched against two lists derived from EU Delegated Regulation 2016/1675:
IR · KP · MM · SY · YE · AF
Iran, North Korea, Myanmar, Syria, Yemen, Afghanistan. Triggers Rule E sanctions fast-track. Highest SHAP weight: 0.821.
PK · TR · ML · VN · MZ · TZ · JO
Pakistan, Turkey, Mali, Vietnam, Mozambique, Tanzania, Jordan. Elevated risk weight; does not trigger automatic SAR.
Temporal Anomalies
The is_night feature flags transactions between 22:00 and 06:00. Legitimate retail banking activity is strongly concentrated in business hours; early-morning timestamps correlate with automated layering scripts. SHAP weight in reference case: 0.392.
Network Graph Typologies
Detected via NetworkX traversal — these patterns are invisible to single-transaction screening.
One account rapidly distributes funds to many receivers — the classic placement layer in a three-stage laundering scheme.
Funds traverse ≥3 hops in under 24 hours, deliberately obscuring the beneficial ownership trail.
Funds return to the originating account after passing through one or more intermediaries — a classic integration indicator.
Many accounts funnel into one — structuring across multiple originators to avoid individual reporting thresholds.
XGBoost Classifier
The risk scoring model is an XGBoost binary classifier trained to distinguish legitimate transactions from suspicious ones. It is trained on the IBM AMLworld dataset (NeurIPS 2023) — a six-file CSV collection of high/low-income transaction categories at small, medium, and large volume tiers. When real data is unavailable, the pipeline falls back to generating 2,000 synthetic transactions.
Training Data
IBM AMLworld is preferred when available in data/. The pipeline loads up to 3 CSV files (≤5,000 rows each), maps IBM payment format labels to the internal transaction type schema, and validates that the positive rate exceeds 1% before training.
14 Engineered Features
All features are derived programmatically in models/features.py from raw transaction fields. No manual labelling is required.
Model Hyperparameters
SHAP Explainability
Every prediction is accompanied by SHAP (SHapley Additive exPlanations) values — a game-theoretic approach that assigns each feature a contribution to the final score. This satisfies EU AI Act interpretability requirements and provides compliance officers with an auditable explanation for every automated decision.
Reference Case: Iran Wire Transfer
A €9,750 international wire from Germany to Iran, timestamped 02:34 AM. The model scores this SAR_REQUIRED. The SHAP breakdown shows which features drove the decision:
Red bars increase risk; green bar mitigates. The account age signal partially offsets the other factors — an older, established account is modestly less suspicious than a recently opened one, all else equal.
Suspicious Activity Reporting
SAR reports follow the BaFin GwG format required for FIU submission via the goAML portal. Each report receives an auto-generated ID in the format SAR-YYYYMMDD-TXID and enters DRAFT status pending compliance officer sign-off.
Claude compiles transaction facts, SHAP attributions, network findings, and triggered regulatory rules into the standardised BaFin template — with inline statutory citations at every assertion.
Report surfaces in the case queue as DRAFT. Designated officer reviews findings, may annotate, and approves or rejects the SAR before submission.
Approved SAR is forwarded to BaFin's Financial Intelligence Unit. Submission timestamp and portal reference number are written to the audit trail.
GwG §47 Abs. 5 is enforced at the UI level — no customer-facing communication may reference the SAR case. The prohibition is printed on every report as a statutory notice.
The ReAct Agent
The platform's decision engine is built on the ReAct (Reasoning + Acting) pattern — a framework where a language model alternates between thinking about what to do and calling tools to do it. Each tool result feeds back into the model's context, allowing it to update its risk assessment before deciding the next step. AML Shield's loop runs up to 10 iterations and uses Claude Haiku via the Anthropic tool-use API.
pick next tool
precise inputs
update assessment
The entire chain — every thought, every tool call, every result — is stored in reasoning_chain and written to an immutable audit trail. The whole reasoning process is available for regulatory examination.
The Five Tools
Claude has exactly five registered tools, called in a default sequence that the conditional branching rules can override.
risk_score 0–100, confidence interval, and SHAP feature attributions. Always called first.Conditional Branching Rules
The system prompt hardcodes five rules that override the default tool sequence — encoding the same proportionality judgments a senior compliance officer would apply intuitively.
IF score < 30 AND domestic
THEN skip network analysis
Domestic low-risk transactions don't warrant graph traversal. Cuts latency for the majority of legitimate payments.
IF score > 80
THEN network depth = 3
High-risk transactions require deeper traversal to surface layering across three degrees of separation.
IF flagged_connections not empty
THEN re-run network on flagged account
One hop is insufficient — flagged accounts must be investigated to their source.
IF 40 ≤ score ≤ 60
THEN document FOR / AGAINST before deciding
Ambiguous cases require documented balanced analysis — a formal pro/con for the audit trail before any escalation decision.
IF sanctions match detected
THEN SAR immediately, skip remaining tools
Sanctions matches are per-se SAR events under EU 6AMLD Art. 18 — no further analysis needed or permitted.
IF uncertain between tiers
THEN escalate higher
Under GwG §17, under-reporting carries criminal liability. The system prompt instructs Claude to default upward when in doubt.
System Prompt Architecture
The system prompt is divided into eight sections and treated as a legal instrument — the source file header reads: DO NOT modify regulatory citations in this file.
AML Shield is in active development. Architecture and regulatory mappings are subject to change as the design is validated against production compliance requirements.