Adverse Media Screening

Adverse Media Screening Sources: Open Web vs Licensed Data

Adverse media screening is required under FATF Recommendation 10 and inspected closely under MAS, FCA, FinCEN and EU 6AMLD. The execution problem is source quality. Open web is broad but noisy; licensed news data is structured but expensive; AML intelligence aggregators are clean but vendor-locked. This guide compares the three categories and explains how mature programmes combine them.

Published: May 2026 Category: Adverse Media Screening Read time: ~12 minutes
Quick Answer
Adverse media screening sources break into three categories. Open web — search engines, news aggregator APIs, public records — offers breadth and zero marginal cost but heavy noise and limited structured fields. Licensed news feeds — Dow Jones, Reuters, Bloomberg, LexisNexis — provide structured access to vetted journalism with reliable provenance but at material licensing cost. Structured AML intelligence aggregators — RDC, World-Check, Sayari, ComplyAdvantage — pre-process adverse media into tagged events against the FATF predicate offence taxonomy, delivering clean signal but with the database being the vendor's product, not the firm's. Production programmes use a hybrid: open web for breadth and recency, licensed feeds for high-credibility events, and a structured aggregator as the primary screening layer. The right mix depends on customer-base risk profile, jurisdictional coverage requirements, and operational budget.

Adverse media screening is required by every major AML regime as part of customer due diligence and especially enhanced due diligence on higher-risk relationships. The principle is clear: regulated firms should know whether a customer has been credibly linked to financial crime, corruption, sanctions evasion or other predicate offences before opening or continuing the relationship.

The execution problem is source quality. A common customer name returns thousands of unrelated news mentions on a basic search. A celebrity who shares a name with your customer drowns the result queue. Tabloid coverage of a divorce gets returned alongside an indictment for wire fraud. Compliance teams either spend hours sorting noise from signal or — more commonly — quietly stop screening adverse media at all and hope the regulator does not inspect that control closely.

The fix is not "search harder" — it is choosing the right data source for the screening task. Each of the three source categories has materially different properties, and the production answer is almost always a combination.

The Three Source Categories

Source category is the single most important configuration decision in an adverse media programme. Coverage breadth, signal quality, structured fields, language coverage, latency and licensing cost all vary across the three categories.

1

Open Web Sources

Search engines (Google, Bing) and news aggregator APIs (Google News, Bing News, free RSS feeds) provide access to publicly indexed content. Coverage is exceptionally broad — essentially the entire publicly accessible internet — and the marginal cost of each query is zero or near-zero. The trade-off is noise: open web returns are unstructured, lack provenance metadata, mix tier-1 journalism with tabloid and aggregator content, and require substantial post-processing to be useful for screening.

2

Licensed News Data

Commercial licensing arrangements with major news providers — Dow Jones Newswires, Reuters, Bloomberg, LexisNexis, Factiva — provide structured access to vetted journalism with full provenance metadata, archival depth, and reliable rights coverage. Content quality is high; the financial press is the spine of the data. Licensing costs are material (often six or seven figures annually for major providers), which limits this approach to firms with sufficient compliance budget to justify the spend.

3

Structured AML Intelligence Aggregators

Specialist providers that pre-process adverse media into structured events tagged against the FATF predicate offence taxonomy and curated for AML relevance. Major providers include LSEG (World-Check), Moody's RDC, Dow Jones Risk & Compliance, ComplyAdvantage, Sayari, and others. The pre-processing handles event categorisation, source weighting, de-duplication across coverage and entity disambiguation — delivering screening-ready data rather than raw news. Cost sits between open web (free) and direct licensing (highest); typically structured aggregators are the most cost-effective option for production programmes.

Open Web: Strengths and Limitations

Open web sources occupy a specific role in mature programmes: they extend coverage beyond what licensed and structured sources catch, and they surface emerging events that may not yet be in the structured aggregators.

Open web strengths:

  • Breadth. Coverage of long-tail jurisdictions, local-language press, and emerging-market sources that licensed feeds may not cover with depth.
  • Recency. Major breaking events typically appear on open web within hours; structured aggregators may lag by 24–72 hours depending on their refresh cycle.
  • Zero marginal cost. Additional queries cost essentially nothing; deep search of any specific customer is operationally trivial.
  • Public transparency. The customer cannot dispute that the underlying information was publicly available, because they can verify it themselves.

Open web limitations:

  • No structured taxonomy. Results are not tagged against FATF predicate offence categories, making categorical filtering impossible without manual review.
  • No source weighting. Tier-1 journalism is mixed with unverified aggregator content; tabloid coverage of irrelevant events appears alongside material AML findings.
  • Common-name false positives. Without biographical context, search results conflate the customer with everyone else of that name.
  • Provenance gaps. Pages may be removed, archives may differ from original publication, source credibility is not metadata.
  • Operational cost in analyst time. Open web is free at the data layer but expensive at the disposition layer — analyst hours sorting noise from signal accumulate quickly.

Licensed Data: Quality, Cost and Coverage

Licensed news data — direct subscription to major financial press archives — offers the cleanest source of high-credibility adverse coverage. It is the data source against which content quality is measured.

Licensed data strengths:

  • Source provenance. Every article carries full attribution, publication date, archival URL and rights metadata. Disputes about credibility are resolved by reference to the underlying publication.
  • Archival depth. Major providers carry decades of coverage, supporting historical adverse media checks on customers with long career histories.
  • Multi-language structured access. Major providers offer machine-readable access to non-English coverage with structured translation summaries.
  • Rights coverage. Direct licensing eliminates the rights ambiguity that affects open web scraping.

Licensed data limitations:

  • Cost. Material licensing fees, typically scaling with query volume. Direct subscriptions to multiple providers can exceed seven figures annually for large enterprises.
  • Still requires AML-specific processing. Licensed feeds are journalism, not AML intelligence. Adverse media classification, predicate-offence tagging and entity disambiguation must be added on top of the licensed feed.
  • Coverage gaps in long-tail markets. Major providers cover major markets well but may have thinner coverage in specific emerging markets or local-language press.

Licensed data is the appropriate source for firms with high regulatory exposure, established compliance budgets, and customer bases concentrated in markets where major providers have strong coverage. It is rarely the right source as a sole solution.

Structured AML Intelligence: The Operational Workhorse

Structured AML aggregators are the primary screening data layer for most production programmes. They sit between raw news (open web, licensed) and the compliance workflow, providing pre-processed events ready for screening.

Structured aggregator strengths:

  • Pre-processed taxonomy. Every event tagged against FATF predicate offence categories — fraud, corruption, money laundering, drug trafficking, sanctions evasion, tax crime, environmental crime and the rest of the 22.
  • Source credibility weighting. Tier-1 reputable sources, regulatory and judicial publications weighted higher than aggregator blogs or unverified social content.
  • Entity disambiguation. Biographical context (DOB, nationality, occupation, known affiliations) used to disambiguate common-name matches; significantly fewer false positives than open web.
  • Event-level de-duplication. Coverage of the same event across multiple outlets surfaces as one event with the source list, not as dozens of separate alerts.
  • Operational efficiency. Analysts review pre-categorised matches; the noise-to-signal ratio is dramatically better than raw news.

Structured aggregator limitations:

  • Vendor lock-in. The pre-processing is the vendor's product. Switching providers requires re-categorisation of historic alerts and customer records.
  • Coverage breadth depends on the vendor. Each major aggregator has stronger coverage in some markets than others; no single provider has universally strong coverage.
  • Latency from event to ingestion. Pre-processing adds 24–72 hours of latency depending on the vendor and event type — breaking events may not be in the aggregator until well after they appear on open web.
  • Cost remains material. Lower than direct licensing but still six figures annually for mid-sized firms; pricing typically scales with query volume.

Hybrid Approaches Used in Production

Most mature programmes combine sources rather than relying on any single category. The combination handles each category's weakness with another's strength.

The standard production hybrid:

  • Structured aggregator as the primary screening layer. Every customer screened at onboarding and continuously rescreened as the aggregator ingests new events. This is the workhorse layer; the majority of adverse media hits come from here.
  • Open web as the secondary EDD layer. For higher-risk customers where the structured aggregator surfaces ambiguous or limited results, analysts run open-web searches to extend coverage. Open web fills the gaps the aggregator missed.
  • Licensed news for archival research. For specific EDD cases — Source of Wealth investigations, complex UBO research, contested PEP classifications — analysts query licensed feeds for deep historical context that the aggregator may not have indexed.

The proportions vary by firm size and customer profile. A high-volume retail bank may run 95% of its adverse media through the structured aggregator with open web only for escalations. A private bank with concentrated HNWI exposure may invest in licensed feeds and use them as the primary source for client research, with the aggregator as a screening backstop. The right balance is a function of the firm's risk concentration, not a universal answer.

Choosing Sources for Your Risk Profile

The source selection should follow the risk profile of the customer base.

  • High-volume, lower-risk customer base. Structured aggregator is the cost-effective primary source. Open web for EDD escalations. Licensed feeds typically not justified at this scale.
  • Lower volume, higher-risk customer base. Combination of structured aggregator + licensed feeds. Open web for emerging events and gap coverage. Higher proportional investment per customer is justified by the elevated per-relationship risk.
  • Concentrated HNWI or institutional business. Licensed feeds as the primary research source. Structured aggregator for screening backstop. Each customer relationship warrants deep research that justifies licensed-data spend.
  • Emerging-market concentration. Coverage gaps in the major providers' emerging-market footprint require open-web supplementation or regional licensed-data providers. Sole reliance on the major aggregators leaves coverage gaps in specific markets.

The decision should be documented and reviewed periodically — typically annually — as the customer base evolves. Where coverage gaps are identified, the response is to extend the source mix, not to assume the existing sources are adequate.

Common Mistakes in Source Selection

Five failure patterns appear repeatedly:

  • Single-source reliance. Sole dependence on any one source — most commonly a single structured aggregator — creates coverage gaps the firm cannot articulate to the regulator. The supervisory expectation is that the firm has thought about source mix and made deliberate choices.
  • No coverage validation. The firm does not periodically test whether the chosen sources actually catch known adverse coverage. Validation testing — feeding known historical cases into the screening engine to confirm they would have surfaced — is standard practice.
  • Open-web only. Programmes that rely solely on Google search for adverse media screening typically have the highest false-positive volumes and the weakest documentation. They survive inspection only because the inspector has not yet looked closely; once they do, the finding is consistent.
  • Treating structured aggregator output as a complete answer. Structured aggregators are excellent but not exhaustive. Programmes that automatically clear a customer when the aggregator returns no hits, without supplementary checks on higher-risk relationships, accept aggregator coverage gaps as their own.
  • No event-recency calibration. Old adverse coverage and recent adverse coverage treated identically in the risk score. Recent material events should weight higher; aged or superseded coverage (acquittals, dismissals, retractions) should be tagged and reflected in scoring.

Mature programmes layer sources, document the rationale, and validate coverage periodically. For the broader compliance context, see our companion guides to sanctions list screening, PEP screening best practices, and sanctions evasion red flags.

Adverse Media With the Right Source Mix

One Constellation combines structured AML intelligence with open-web and licensed feeds — 60,000+ sources across 150+ languages, FATF predicate offence tagging, and source-credibility weighting in a single screening layer.

← Sanctions Evasion Red Flags All Articles
Scroll to Top