PII discovery · Anonymization · Synthetic data · Runs in your environment

Your data estate has exposure you haven't mapped yet.

VestraData scans databases, files, and cloud storage for regulated data, then gives your team the controls to act. Runs entirely in your environment. Nothing leaves.

Air-gap readyEncrypted credentialsAudit loggedTenant scopedNo phone-home
Scan activity | prod-postgres-01 Live scan
09:14:02 CONNECT prod-postgres-01 · encrypted credential accepted
09:14:03 DISCOVER schema: public · 19 tables · 312 columns mapped
09:14:07 SCAN users.email → PERSONAL_EMAIL · 14,823 rows · 99.1%
09:14:08 SCAN patients.full_name → FULL_NAME · 8,441 rows · 97.8%
09:14:09 SCAN payments.card_no → CREDIT_CARD · 22,109 rows · 94.2%
09:14:11 SCAN contacts.mobile → PHONE_NUMBER · 6,201 rows · 98.5%
09:14:13 ACTION 38 findings queued · audit record written
09:14:13 STATUS scan complete · 247 fields · 0 bytes egressed
247 Fields scanned
38 Findings
0 Bytes egressed

↑ Representative output, runs entirely inside your environment

847M Records protected across customer deployments
99.97% Detection accuracy GLiNER v2 engine, production
0 Data sent to us runs entirely in your environment
< 4 hr Time to first scan from credential to findings
GDPR Art. 30 HIPAA §164.514 PCI-DSS 4.0 ISO 27001 NHS DSPT SOC 2 Type II CCPA §1798
Platform capabilities

Four controls. One review queue.

Discovery, remediation, and proof across databases, files, and the AI tools your team uses every day.

PII Discovery & Classification

Find regulated data before an auditor does.

Connect Postgres, MySQL, Snowflake, S3, and SharePoint. VestraCore samples schemas, scores fields by name, value, and context, and returns field-level findings with confidence scores and row counts.

Fields scanned by risk level
HIGH
62%
MED
24%
LOW
14%
Adaptive sampling, initial pass is lightweight
Zero-shot classification via GLiNER v2
Structured + unstructured in one review queue
Every action written to tamper-evident audit log
VD-CORE-001
Synthetic Data Generation

Ship realistic test data without touching production.

Generate statistically faithful datasets for engineering, QA, and ML pipelines. Referential integrity is preserved. Distribution and correlation are matched. Exports go directly to staging DBs or object storage.

Synthetic fidelity metrics
Stat. dist.
97%
FK integrity
100%
NULL rate
94%
FK-preserving extraction across related tables
Differential-privacy mode for HIPAA/GDPR exports
Scheduled refreshes, no manual rebuild cycle
Exports: Parquet, S3, direct DB import
VD-SYNTH-002
Data Airlock

Govern what leaves your boundary before it leaves.

Watch repositories for new documents and datasets. When a match is found, a governed clean copy is produced ahead of time so partner handoffs and AI tooling never receive raw PII.

Documents processed this quarter
Passed
78%
Redacted
17%
Blocked
5%
Monitors SharePoint, Drive, S3, and SFTP drops
Clean copy created in governed staging location
Supports partner handoff without emailing raw files
Reduces last-minute manual review before sharing
VD-AIRLOCK-003
AI Endpoint DLP

Control what employees submit to external AI tools.

Intercept prompts and file uploads before they reach ChatGPT, Claude, or Copilot. Policy can warn, block, or transform without treating every employee as a privacy expert.

Prompt interceptions this month
Allowed
83%
Transformed
12%
Blocked
5%
Browser-level: no endpoint agent required
Applies to typed prompts and uploaded documents
Policy aligned with existing privacy classification
Audit trail matches VestraCore review records
VD-SHIELD-004
Full scan output

What a scan looks like in your environment.

The CLI outputs field-level findings, row counts, and confidence scores. Every finding is written to a tamper-evident audit record before the session ends.

38 PII findings, 6 field types
11.2s Scan time, 312 columns
100% Audit logged, immutable
vestra-cli | prod-postgres-01 | 80x28
$ vestra scan --source prod-postgres-01 --schema public --depth full
VestraData v2.4.1 · tenant: acme-corp · region: eu-west-1
──────────────────────────────────────────────────────────────────────
[09:14:02] CONNECT prod-postgres-01 accepted · credentials encrypted
[09:14:03] DISCOVER 19 tables · 312 columns · ~2.1M rows estimated
[09:14:07] FINDING users.email PERSONAL_EMAIL HIGH 99.1% 14,823 rows
[09:14:08] FINDING patients.full_name FULL_NAME HIGH 97.8% 8,441 rows
[09:14:09] FINDING payments.card_no CREDIT_CARD MED 94.2% 22,109 rows
[09:14:10] FINDING contacts.mobile PHONE_NUMBER HIGH 98.5% 6,201 rows
[09:14:11] FINDING orders.billing_addr POSTAL_ADDRESS MED 91.3% 18,847 rows
[09:14:12] FINDING hr.national_id GOV_ID HIGH 99.6% 3,204 rows
──────────────────────────────────────────────────────────────────────
[09:14:13] SUMMARY 38 findings · 247 fields scanned · 0 bytes egressed
[09:14:13] AUDIT record written · hash: sha256:a3f9c2e1… · immutable
[09:14:13] COMPLETE scan finished in 11.2s · review queue ready
$ _
How it works

Five steps. One consistent audit trail.

Every VestraData workflow follows the same sequence, from credential to evidence, regardless of data source or deployment model.

01 Connect Add a source. Credentials are encrypted per-tenant. Scope is defined before any scan runs. [ENCRYPTED_CREDENTIAL · TENANT_SCOPED]
02 Discover Lightweight schema pass. Maps tables, estimates volume, surfaces likely-sensitive fields. [SAMPLE_RATE: adaptive · NO_FULL_SCAN]
03 Scan Deep field-level scan with confidence scores, row counts, and context evidence for review. [ENGINE: GLiNER-v2 · ZERO_SHOT: true]
04 Act Apply the right control: mask, anonymize, generate a synthetic export, or prepare a governed copy. [AUDIT_LOG: tamper_evident · POLICY: enforced]
05 Prove The decision trail is complete. Show regulators what was found, what changed, and who approved it. [GDPR:Art.30 · HIPAA:§164 · PCI:4.0]
Deployment models

Runs where your data lives.

Three models. No vendor lock-in. No requirement to move data to assess it.

Model 01 Cloud Appliance

Deploy into your own AWS, Azure, or GCP account. Your networking, your IAM, your storage. No production data routed to vendor infrastructure.

AWS MarketplaceAzure MarketplaceGCP MarketplaceTerraformCloudFormation
Model 02 On-Premises / Air-Gap

Run inside a private data centre or restricted network segment. No internet dependency at runtime. For teams where operational data egress is ruled out by policy.

Docker ComposeHelm / KubernetesLDAP · SAMLOffline licenseNo phone-home
Model 03 SDK / Embedded

Embed the detection and policy layer directly into an existing pipeline or product when a standalone deployment is not the right fit.

PythonNode.jsJava.NETOpenAPI spec
Use cases

Regulated industries. Specific requirements.

VestraData deploys into environments where data residency, auditability, and access controls are non-negotiable, not optional extras.

Healthcare & NHS

Patient data stores, air-gapped deployments, DSPT + HIPAA coverage for NHS and private health systems.

Air-gap, no internet at runtimeNHS DSPT + HIPAA complianceOn-prem, zero cloud egress
Financial Services

PCI-DSS scope reduction, synthetic data for ML pipelines, LDAP/SAML auth for banks and insurers.

PCI-DSS scope reductionSynthetic data for analyticsLDAP / SAML enterprise auth
Legal & Professional

Controlled document sharing, AI DLP for client privilege materials, audit trail for law and accounting.

Document airlock: SharePoint/DriveVestraShield AI endpoint DLPClient privilege audit trail
Data Marketplaces

Scan inbound datasets before publication. Multi-tenant isolation with SDK-first integration.

SaaS SDK, event-driven scanningMulti-tenant isolationPython / Node / Java clients
Dev & Test Teams

Eliminate brittle hand-built masking scripts. Statistically faithful synthetic subsets for staging and QA.

FK-preserving extractionDirect staging DB importScheduled refresh cycles
ML Engineering

Training and evaluation datasets that preserve distribution and correlation without accessing live data.

DP-compliant synthetic outputParquet, S3, data lake exportDistribution + correlation preserved
Design partner programme

Working with a small number of organisations before the public launch.

We are working with a small number of design partners in regulated industries, organisations with a real privacy, compliance, or data-governance problem and a team willing to work closely with us to solve it well.

Design partners get hands-on support from the team, early access to new capabilities, and a shorter loop from feedback to shipped product. Terms are structured for an early partnership and a defined pilot scope. If you want to be a public reference later, we welcome it, but we will never ask.

Apply as a design partner →
Open slots by sector
Healthcare / NHS
Air-gapped hospital network or NHS trust with DSPT and HIPAA requirements.
Open
Financial Services
Bank or investment firm with PCI-DSS scope or synthetic data needs.
Open
Legal / Professional
Law firm or accountancy sharing documents with external AI tools.
Next cohort
Data Marketplace
Platform ingesting third-party datasets that require PII scanning at ingest.
Limited
Technical review

Here is exactly what happens when you book a session.

Not a slide deck. Not a sandboxed environment with fabricated data. We connect to something real in your organisation and you see actual findings.

Minutes 0-5 We start with one real source
Usually a read-only database credential, file store, or bucket that is representative enough to answer whether the product fits your environment.
Minutes 5-20 We run discovery and scan
You watch it happen live. The schema map builds in real time. Findings appear as the scan progresses. No prepared screenshots.
Minutes 20-35 We walk through the findings
What was found, where, the risk level, and the confidence score. We explain any finding you want to understand in more depth.
Minutes 35-45 You pressure-test the fit
Deployment, controls, source coverage, air-gap requirements, and what a narrow pilot in your environment would actually involve.

After the session, you should know whether the deployment model works, whether the first workflow is meaningful, and whether a pilot is justified.

Median time to first scan: under 4 hours from credentials.