Compliance Platform In Development

AML Shield

An AI-powered AML compliance platform for BaFin-regulated German financial institutions. A conditional ReAct agent combines Claude Haiku with an XGBoost classifier to deliver explainable transaction risk decisions grounded in live regulatory citations — from GwG to FATF to EU 6AMLD — producing BaFin-format SAR reports with immutable audit trails.

GitHub ↗

Python FastAPI Claude Haiku XGBoost SHAP NetworkX Docker BaFin / GwG FATF 6AMLD

Regulatory Framework

Legal Grounding

Every decision is anchored to a specific statutory obligation. The primary framework is German law (GwG), supplemented by FATF standards and EU directives where they impose a higher obligation. Citations are embedded in the agent's reasoning chain — compliance is not a post-hoc annotation.

Instrument	Provision	How it is applied in AML Shield
GwG §10 Abs. 3	Customer due diligence for transactions ≥ €10,000	Triggers `is_above_ctr` feature in the ML model; regulatory checker raises a MEDIUM-severity flag
GwG §43 Abs. 1	Obligation to report suspicious transactions to the BaFin FIU	The primary trigger for SAR_REQUIRED decisions; report is submitted via the goAML portal
GwG §47 Abs. 5	Tipping-off prohibition — informing a customer of a SAR is a criminal offence	Explicit warning printed on every SAR output; UI prevents any customer-facing message referencing the case
GwG §17	Criminal liability for under-reporting (Strafbarkeit)	Justifies the conservative default bias in the system prompt: "when uncertain between tiers, escalate higher"
FATF Rec. 16	Wire transfer transparency — originator and beneficiary info must accompany cross-border payments	Missing/incomplete IBAN or counterparty data raises the risk score and triggers a regulatory flag
FATF Rec. 19	Enhanced due diligence for transactions involving blacklisted jurisdictions	`is_high_risk_country` feature; Rule E: sanctions fast-track overrides all other processing
FATF Rec. 20	Suspicious transactions must be reported regardless of amount	Overrides the €10,000 threshold — suspicious patterns below CTR are still escalated
EU 6AMLD Art. 18	Expanded predicate offences and enhanced corporate liability	SAR narrative includes 6AMLD transposition language when predicate offence indicators are present
EU Reg. 2015/847	Wire Transfer Regulation — information accompanying transfers of funds	Cross-border payment scrutiny layer; flags transfers lacking compliant originator records
EU MiCA 2023/1114	Crypto-asset service provider obligations	Applied by the regulatory checker when `transaction_type = crypto_exchange`

Detection Typologies

Money Laundering Patterns

The rule engine and ML features jointly encode four primary typologies. Each maps to one or more statutory obligations.

Structuring (Smurfing)

Transactions in the €8,500–€9,999 band are flagged by the is_near_ctr_threshold feature — deliberate positioning just below the €10,000 Cash Transaction Reporting threshold to avoid GwG §10 Abs. 3 scrutiny. This is the second-highest SHAP contributor in the reference case (0.614).

High-Risk Jurisdiction Exposure

Country codes are extracted from BIC/IBAN and matched against two lists derived from EU Delegated Regulation 2016/1675:

FATF Blacklist (is_high_risk_country)

IR · KP · MM · SY · YE · AF

Iran, North Korea, Myanmar, Syria, Yemen, Afghanistan. Triggers Rule E sanctions fast-track. Highest SHAP weight: 0.821.

FATF Greylist (is_greylist_country)

PK · TR · ML · VN · MZ · TZ · JO

Pakistan, Turkey, Mali, Vietnam, Mozambique, Tanzania, Jordan. Elevated risk weight; does not trigger automatic SAR.

Temporal Anomalies

The is_night feature flags transactions between 22:00 and 06:00. Legitimate retail banking activity is strongly concentrated in business hours; early-morning timestamps correlate with automated layering scripts. SHAP weight in reference case: 0.392.

Network Graph Typologies

Detected via NetworkX traversal — these patterns are invisible to single-transaction screening.

Hub-and-Spoke

One account rapidly distributes funds to many receivers — the classic placement layer in a three-stage laundering scheme.

Rapid Layering

Funds traverse ≥3 hops in under 24 hours, deliberately obscuring the beneficial ownership trail.

Round-Trip Cycling

Funds return to the originating account after passing through one or more intermediaries — a classic integration indicator.

Fan-In Aggregation

Many accounts funnel into one — structuring across multiple originators to avoid individual reporting thresholds.

Machine Learning

XGBoost Classifier

The risk scoring model is an XGBoost binary classifier trained on the IBM AMLworld dataset (NeurIPS 2023). When real data is unavailable, the pipeline falls back to generating 2,000 synthetic transactions.

Training Data

Synthetic Fallback — 2,000 Transactions

1,400

legitimate (70%)

600

suspicious (30%)

Legitimate: card payments, internal transfers, wire transfers. Low-risk countries, business hours, amounts €10–€5,000.

Suspicious: mixed structuring bands (€8,500–€9,999), high-risk countries, night timestamps, crypto exchanges.

IBM AMLworld is preferred when available in data/. The pipeline loads up to 3 CSV files (≤5,000 rows each), maps IBM payment format labels to the internal transaction type schema, and validates that the positive rate exceeds 1% before training.

14 Engineered Features

All features are derived programmatically in models/features.py from raw transaction fields. No manual labelling is required.

amount_log

amount_eur

is_near_ctr_threshold

is_above_ctr

is_round_number

hour_of_day

is_night

is_weekend

is_cross_border

is_high_risk_country

is_greylist_country

transaction_type_wire

transaction_type_crypto

transaction_type_cash

■ Amount ■ Time ■ Geographic ■ Transaction Type

Model Hyperparameters

n_estimators

200

Boosting rounds; capped by early stopping

max_depth

Sufficient for feature interactions without overfitting

learning_rate

0.10

Conservative shrinkage; balances bias/variance

scale_pos_weight

auto

= negatives / positives; corrects class imbalance

eval_metric

AUC

Optimises ranking, not accuracy — better for imbalanced data

early_stopping

Stops if AUC on held-out test set doesn't improve for 20 rounds

SHAP Explainability

Every prediction is accompanied by SHAP values — a game-theoretic approach assigning each feature a contribution to the final score. This satisfies EU AI Act interpretability requirements and provides compliance officers with auditable explanations.

Reference Case: Iran Wire Transfer

A €9,750 international wire from Germany to Iran, timestamped 02:34 AM. The model scores this SAR_REQUIRED. The SHAP breakdown shows which features drove the decision:

is_high_risk_country

+0.821

is_near_ctr_threshold

+0.614

is_night

+0.392

is_cross_border

+0.280

transaction_type_wire

+0.180

account_age_days

−0.120

Red bars increase risk; green bar mitigates. The account age signal partially offsets the other factors — an older, established account is modestly less suspicious than a recently opened one, all else equal.

SAR Workflow

Suspicious Activity Reporting

SAR reports follow the BaFin GwG format required for FIU submission via the goAML portal. Each report receives an auto-generated ID in the format SAR-YYYYMMDD-TXID and enters DRAFT status pending compliance officer sign-off.

Agent drafts SAR narrative

Claude compiles transaction facts, SHAP attributions, network findings, and triggered regulatory rules into the standardised BaFin template — with inline statutory citations at every assertion.

Compliance officer review

Report surfaces in the case queue as DRAFT. Designated officer reviews findings, may annotate, and approves or rejects the SAR before submission.

goAML submission

Approved SAR is forwarded to BaFin's Financial Intelligence Unit. Submission timestamp and portal reference number are written to the audit trail.

Tipping-off safeguard

GwG §47 Abs. 5 is enforced at the UI level — no customer-facing communication may reference the SAR case. The prohibition is printed on every report as a statutory notice.

Statutory notice on every SAR output: "Alerting the customer about this SAR filing constitutes a criminal offence under GwG §47 Abs. 5."

Under the Hood — AI Engine

The ReAct Agent

The decision engine uses the ReAct pattern — the model alternates between thinking about what to do and calling tools to do it. Each tool result feeds back into context, updating the risk assessment before the next step. The loop runs up to 10 iterations with Claude Haiku via the Anthropic tool-use API.

Step 1

Think

Analyze transaction,
pick next tool

→

Step 2

Act

Call tool with
precise inputs

→

Step 3

Observe

Read result,
update assessment

↩

repeat up to 10×

The entire chain — every thought, every tool call, every result — is stored in reasoning_chain and written to an immutable audit trail. The whole reasoning process is available for regulatory examination.

The Five Tools

Claude has exactly five registered tools, called in a default sequence that the conditional branching rules can override.

①

transaction_risk_scorer

Runs the XGBoost model. Returns risk_score 0–100, confidence interval, and SHAP feature attributions. Always called first.

②

entity_network_analyzer

NetworkX graph traversal around sender/receiver accounts. Detects structuring, layering, fan-out/in, and cycle patterns. Default depth 2; depth 3 when score > 80; recursive if flagged connections returned.

Rule A: skipped if score < 30 and domestic

Rule B: depth=3 if score > 80

Rule C: recursive if flagged_connections non-empty

③

regulatory_rule_checker

Checks GwG, FATF 40 Recommendations, EU 6AMLD, Wire Transfer Reg. 2015/847, and MiCA. Returns triggered rules with severity and exact statutory citations.

④

sar_report_generator

Generates a BaFin/GwG-format SAR for goAML submission. Only invoked at score ≥ 80 or confirmed sanctions match. Auto-populates narrative and appends GwG §47 Abs. 5 tipping-off warning.

Rule E: immediate if sanctions match detected

⑤

case_escalation_decider

Final arbiter. Accepts risk score, network risk tier, triggered rule count, and Claude's reasoning summary. Returns decision, case priority, SLA hours, and compliance queue. Always called last.

Conditional Branching Rules

The system prompt hardcodes five rules that override the default tool sequence — encoding the same proportionality judgments a senior compliance officer would apply intuitively.

Rule A — Low Risk Shortcut

IF score < 30 AND domestic THEN skip network analysis

Domestic low-risk transactions don't warrant graph traversal. Cuts latency for the majority of legitimate payments.

Rule B — Deep Network Analysis

IF score > 80 THEN network depth = 3

High-risk transactions require deeper traversal to surface layering across three degrees of separation.

Rule C — Recursive Investigation

IF flagged_connections not empty THEN re-run network on flagged account

One hop is insufficient — flagged accounts must be investigated to their source.

Rule D — Grey Zone Analysis

IF 40 ≤ score ≤ 60 THEN document FOR / AGAINST before deciding

Ambiguous cases require documented balanced analysis — a formal pro/con for the audit trail before any escalation decision.

Rule E — Sanctions Fast-Track

IF sanctions match detected THEN SAR immediately, skip remaining tools

Sanctions matches are per-se SAR events under EU 6AMLD Art. 18 — no further analysis needed or permitted.

Conservative Default

IF uncertain between tiers THEN escalate higher

Under GwG §17, under-reporting carries criminal liability. The system prompt instructs Claude to default upward when in doubt.

System Prompt Architecture

The system prompt is divided into eight sections and treated as a legal instrument — the source file header reads: DO NOT modify regulatory citations in this file.

§1

Role Definition

Positions the agent as a licensed compliance officer. Sets accountability framing: "Every decision you make carries legal weight."

§2

Regulatory Framework

Exhaustive citation list: GwG §10, §17, §43, §47; FATF Rec. 16/19/20/29; EU 6AMLD; Wire Transfer Reg. 2015/847; MiCA 2023/1114.

§3

ReAct Protocol

Defines the Think → Act → Observe loop. Requires Claude to explicitly state what it learned from each tool result before deciding the next action.

§4

Tool Calling Order

Default sequence: scorer → network → rules → SAR → decider. Explicitly marked as overridable by the conditional rules in §5.

§5

Conditional Branching

Rules A through E. Prefixed "CRITICAL — You MUST follow these rules" to ensure reliable adherence.

§6

Decision Thresholds

Score-to-decision mapping: 0–29 CLEAR, 30–59 WATCHLIST, 60–79 ESCALATE, 80–100 SAR_REQUIRED, with SLA hours per tier.

§7

Output Format

Mandates a structured output block (DECISION, Risk Score, Key Findings, Regulatory Basis, Reasoning) for downstream parsing and audit logging.

§8

Behavioural Rules

Hard prohibitions: never fabricate tool results, always cite specific articles, never alert the customer (GwG §47 Abs. 5), default to SAR when uncertain.

AML Shield is in active development. Architecture and regulatory mappings are subject to change as the design is validated against production compliance requirements.