Breachline is an autonomous AI penetration testing platform backed by a UK offensive security team. Its AI engine, Nebula, uses multiple reasoning models and a swarm of specialist agents to discover, exploit, and verify vulnerabilities, with every finding backed by real tool output and a working proof-of-concept. It delivers compliance-ready reports mapped to PCI-DSS 4.0, SOC 2, HIPAA, ISO 27001, OWASP Top 10, and NIST CSF.

How does autonomous pentesting work?

Nebula AI follows a multi-phase autonomous attack lifecycle: (1) Planning - classifies target and assembles optimal agent team, (2) Recon - runs tools in parallel to map attack surface, (3) Analysis - specialist agents test every vulnerability class with MITRE ATT&CK mapping, (4) Exploit - builds working PoC exploits in sandboxed containers, (5) Report - delivers executive summaries and technical deep-dives to Slack and email instantly.

What compliance frameworks does Breachline support?

Breachline automatically maps findings to 6 compliance frameworks: PCI-DSS 4.0, HIPAA, SOC 2 Type II, ISO 27001, OWASP Top 10, and NIST CSF with specific control IDs. Reports are audit-ready and delivered instantly via Slack and email.

Is Breachline safe to use on production systems?

Yes. Nebula runs all exploits in isolated Docker containers with fresh network namespaces. Containers self-destruct after each test, leaving zero artifacts on your infrastructure.

Nebula: The Autonomous AI Penetration Testing Platform

A modern engineering team ships dozens of times a day. Its attack surface, thousands of endpoints across web, APIs, cloud, and identity, changes by the hour. The standard answer to that risk is a penetration test once a quarter, scoped to a slice of the surface, delivered as a PDF two weeks later. By the time the report lands, the application it described is gone.

Nebula is Breachline's answer to that gap. It is an autonomous AI security platform that reasons about a target the way an experienced attacker does, runs continuously, and proves what it finds with working exploits. This paper explains what Nebula is, how it is built, and where it fits next to scanners, traditional pentests, and your existing workflows. It also draws a clear line between what the platform does today, in Early Access, and what is on the roadmap.

"The question is no longer whether AI can find vulnerabilities. It is whether your security program can keep pace with software that ships continuously and attackers who never stop looking." Breachline Labs

Executive Summary

Point-in-time testing was designed for software that changed a few times a year. Software no longer behaves that way, and neither do attackers. The result is a structural coverage gap: the surface a quarterly assessment never sees is exactly where breaches happen.

Nebula closes that gap with three ideas working together:

Reasoning, not signatures. Nebula forms hypotheses about how an application works, tests them, learns from the response, and adapts, instead of replaying a fixed payload library.
A swarm, not a single model. A team of specialist agents works the attack surface in parallel, spawning new specialists for each lead and chaining individual weaknesses into full attack paths.
Proof, not noise. Every finding ships with the exact request, the response that proves it, and step-by-step reproduction. If Nebula reports it, it is exploitable.

This whitepaper covers the structural failure of point-in-time testing, the company behind Nebula, how its multi-tier intelligence and agent swarm reason about a target, the platform architecture end to end, the integration and compliance layers, the security model, and an honest view of outcomes and cost.

1. The Problem: Testing That Cannot Keep Up

Continuous delivery, microservices, and ephemeral cloud infrastructure created an attack surface that expands faster than any human team can assess it. Traditional penetration testing was built for a slower world, and the mismatch is structural, not a matter of effort or skill.

The shape of the gap

Reality of modern delivery	Point-in-time testing
Many deploys per day	A handful of assessments per year
Thousands of endpoints per app	Sampled coverage per engagement
Production minutes after merge	Report delivered days or weeks later
Near-zero cost to ship new code	High fixed cost per engagement
Attackers automate, around the clock	Human testers, in hours, in scope

Two consequences follow. First, most of the surface is never tested in any given window, so risk accumulates between engagements. Second, the vulnerability classes that matter most are the ones automated scanners miss: broken object-level authorization, JWT algorithm confusion, race conditions, GraphQL-specific abuse, and business-logic flaws that no signature library can describe.

Legacy DAST tools do not fill the gap. They generate large volumes of low-confidence findings, most of them false positives, while systematically missing logic and authorization bugs. The industry did not need a faster scanner. It needed something that reasons.

2. About Breachline

Breachline Labs is a UK security company, headquartered in London, built on one conviction: security testing should be as continuous, comprehensive, and automated as the software it is meant to protect.

The team comes from offensive security research, large-scale AI systems, and enterprise security engineering. That combination matters, because the hard part is not running tools. It is reasoning about a target the way an attacker does, and building AI that does so reliably rather than confidently making things up.

Breachline pairs the Nebula platform with a UK red team. Nebula runs continuously on its own; the human team steps in for scoped, expert-led engagements and signs off the work auditors, boards, and customers accept.

Our mission: find every vulnerability and prove every risk before attackers do, continuously and at scale.

That mission shapes every design decision in Nebula: why it produces working proof-of-concept exploits rather than theoretical findings, why it chains weaknesses into attack paths rather than listing them in isolation, and why it plugs into the tools you already use instead of demanding a new silo.

3. The Intelligence at the Core

Multi-tier intelligence, not one model on a checklist

Most "AI security" products wrap a single general-purpose model around a prompt. Nebula does not. It routes every task to the right tier of intelligence and cascades to a fallback if a tier fails:

Tier	Role
Nebula Fast	Speed tier: classification, triage, routing decisions
Nebula Core	Primary agentic engine: long-context analysis and bulk reasoning
Nebula Max	Premium reasoning: zero-day hunting, exploit design, code review

A lightweight task does not pay for deep reasoning, and a hard problem is never starved of it. The router scores reliability continuously and reorders the chain so that engines which hallucinate are demoted and dependable ones are promoted. There is no single-vendor lock-in and no single point of failure.

Two brains, one platform

Nebula runs as a dual-brain system:

A conversational brain that talks to your team in plain language over Slack, Teams, or the web. You describe a target in any language, it asks clarifying questions, and it builds a complete profile.
An autonomous swarm brain that executes the engagement: it hunts, exploits, proves, and reports without a human in the loop.

A swarm that reasons like an attacker

The defining trait of an elite tester is not tool proficiency. It is reasoning. They form a hypothesis, remember that a low-severity finding from an hour ago might combine with something new, adapt when the application pushes back, and understand what the business should do well enough to notice when the code does something different.

Nebula encodes that loop and runs it across the whole surface at once. A lead agent orchestrates a swarm of specialist agents (recon, exploitation, authentication, cloud, and more), and spawns new specialists on demand for each lead it surfaces.


Rendering diagram

Every cycle runs in parallel across the full attack surface. No fatigue, no context loss, no scope drift.

A memory that compounds

Human experts get sharper on a target over time because they remember it. Nebula has a nine-layer memory that does the same: it retains which payloads bypassed your WAF, which endpoints were patched, and which attack chains still work. Each engagement feeds the next, so the second scan starts where the first left off rather than from zero.

Evidence over confidence

A reasoning system is only useful if you can trust its output. Nebula is built to refuse to guess: it extracts and quotes tool output before drawing conclusions, cites the evidence behind every claim, and runs layered hallucination checks with a best-of-N pass on the highest-severity findings. "I do not know" is an allowed answer. A confident fabrication is not.

4. How Nebula Works: Platform Architecture

A Nebula engagement moves through a fixed pipeline: input, analysis, planning, execution, validation, and reporting. The phases below are the parts you see.

Phase 1: Reconnaissance and surface mapping

Before a single test runs, Nebula builds a model of the target that goes well beyond passive enumeration.


Rendering diagram

In this phase Nebula discovers every reachable endpoint (including undocumented and shadow APIs), fingerprints the stack, maps authentication patterns such as OAuth flows and JWT implementations, recovers API schemas through GraphQL introspection and OpenAPI parsing, and identifies the trust boundaries between services.

Phase 2: Autonomous vulnerability testing

Nebula tests the applicable vulnerability classes against each endpoint in parallel. Rather than firing every payload at every input (the approach that makes scanners noisy), it uses the technology fingerprint and observed behavior to choose the attack vectors most likely to land.


Rendering diagram

Findings are evidence-gated. Every reported issue includes the exact HTTP request that triggers it, the response that proves exploitation, and the steps to reproduce. The point is to leave you with effectively nothing to triage: confirmation happens in the sandbox, not in your inbox.

Phase 3: Exploit chaining and the attack graph

Individual vulnerabilities are rarely the real risk. Attack paths are.

A low-severity open redirect becomes a critical account takeover when combined with an OAuth flaw. A medium SSRF becomes cloud compromise when the instance exposes its metadata service. A "low" information leak becomes the first hop of a lateral-movement chain. Nebula models each confirmed weakness as a state transition in a directed attack graph, a map of how an attacker's capability grows as they move through the environment.


Rendering diagram

The diagram above is an illustrative chain, not a measured result. The pattern it shows is the point: the steps that compose into a critical path are very often rated Low or Medium on their own, which is exactly why signature tools and time-boxed engagements deprioritize them. Nebula evaluates the impact of the whole path, not the individual scores.

Phase 4: Real-time reporting

Findings arrive as they are confirmed, not in a document weeks later, and each lands in the right format for its audience.

Report tier	Audience	Content
Executive Summary	CISO, Board	Risk posture, business impact, trend over time
Technical Report	Security, AppSec	Full exploit details, reproduction, remediation guidance
Compliance Report	Auditors, GRC	Findings mapped to PCI DSS, SOC 2, ISO 27001, NIST CSF, GDPR
Developer Report	Engineering	Findings per repository or service, with inline fix suggestions

5. Core Capabilities

Continuous, not point-in-time

Nebula runs continuously against staging and production. An endpoint deployed overnight is in scope by the next cycle, and a config change that opens an SSRF vector is caught without waiting for the next quarterly window.


Rendering diagram

Systematic OWASP Top 10 coverage

Nebula is designed to work through all ten OWASP categories systematically, including the design and logic categories that time-constrained engagements usually reach last, if at all.

OWASP category	Typical scanner	Nebula approach
A01 Broken Access Control	Partial, sampled	Cross-user and cross-tenant validation
A02 Cryptographic Failures	Limited	Crypto analysis and JWT attacks
A03 Injection	Strong	All injection classes, all input vectors
A04 Insecure Design	Rarely tested	Business-logic and workflow analysis
A05 Security Misconfiguration	Moderate	Stack-aware misconfiguration testing
A06 Vulnerable Components	Tool-dependent	SBOM and dependency review
A07 Auth Failures	Moderate	Auth bypass and session testing
A08 Data Integrity Failures	Rarely tested	Deserialization and CI/CD supply chain
A09 Logging Failures	Almost never	Attack-traffic detection checks
A10 SSRF	Limited	Internal pivot, cloud metadata, full chains

Coverage across the whole surface


Rendering diagram

CI/CD integration

Nebula can run inside the deployment pipeline. A pull request or a deploy triggers a targeted scan of the changed surface, and the result comes back before the change reaches production.


Rendering diagram

6. The Integration Ecosystem

Nebula is built to work inside the tools your teams already use, not to add another place to check.


Rendering diagram

What that looks like in practice, as illustrative examples:

On-call engineer. A PagerDuty alert fires for a critical chain in production: SSRF into cloud credential theft. The alert carries the exact request, the chain visualization, and a link to the full finding, so the picture is clear before the laptop is even open.
Developer on a pull request. A GitHub status check from Nebula reports no new findings introduced by the change, and the branch is clear to merge. No review queue, no surprises later.
Security team on Monday. A Slack summary recaps the week's testing and links to the full report in the SIEM.
CISO before a board review. A dashboard pulls live data from Nebula: coverage over time, findings by severity and category, and posture against the frameworks that matter.

7. Compliance Automation

Every finding is mapped to the frameworks your organization answers to, translating a technical issue into the control language an auditor expects.


Rendering diagram

When an auditor asks for penetration-testing evidence, you export a compliance report: the findings, the tests run, and the clean results, timestamped and formatted for review.

8. Security and Trust

This is a tool that attacks systems on your behalf, so its own security model has to be airtight.

Sandboxed execution. Every exploit attempt runs in an isolated container, provisioned fresh per scan and destroyed after use, with network access limited to the authorized target. Nebula leaves no backdoors and no persistent access.
Graduated containment. Testing runs at an explicit containment level, from passive observation through active exploitation, so the intensity of an engagement always matches what you authorized.
Scope enforcement. Nebula operates only within the defined scope. Domain and IP boundaries, allowlisting, and rate limits keep its activity controlled and predictable, and every request it makes is logged for audit.
Data handling. Application data used to validate an exploit is processed in memory and discarded. Findings metadata is retained for reporting and trend analysis, with customer-controlled data-residency options.

9. What Continuous, Autonomous Testing Changes

Nebula is in Early Access, so this section is about the structural difference the model makes, not a claim of measured customer results.

The core change is coverage over time. A point-in-time test inspects a sample of the surface on one day; the rest of the surface, and every change after that day, goes untested until the next engagement. Continuous autonomous testing inverts that. The whole surface is in scope, new code is tested as it ships, and findings arrive while they still describe the live system.

The second change is the unit of risk. Scanners and time-boxed tests report findings in isolation, which pushes Low and Medium items into a backlog. Because Nebula chains weaknesses into attack paths and scores the path, the items that actually compose into a breach surface as a priority instead of disappearing into a list.

The third change is triage cost. When every finding ships with a working proof of concept and reproduction steps, the security team spends its time on remediation and judgment calls, not on sorting real issues from scanner noise.

An illustrative cost model

The figures below are illustrative industry ranges to frame the economics, not Breachline-measured results. Use your own contract and salary numbers in place of these.

Cost category	Point-in-time model	Continuous model
Recurring assessment spend	High fixed cost per engagement	Subscription
Scanner licensing	Separate line item	Included
Triage labor	Ongoing security-team time	Minimal, findings ship proven
False-positive resolution	Ongoing developer time	Minimal
Attack-surface coverage	Sampled	Full surface, continuous

For context on the cost being managed against, IBM's Cost of a Data Breach report puts the global average cost of a breach in the millions of dollars per incident. The case for continuous testing is that the surface left untested between engagements is where many of those incidents begin.

10. Nebula Next to the Alternatives


Rendering diagram

Capability	DAST Scanner	Traditional Pentest	Nebula
OWASP Top 10 coverage	Partial	Most	Systematic
Business-logic testing	No	Partial	Yes
Exploit chaining	No	Limited	Yes
Continuous operation	Yes	No	Yes
Proof-of-concept exploits	No	Yes	Yes
GraphQL-native testing	No	Limited	Yes
Race-condition detection	No	Rare	Yes
CI/CD integration	Limited	No	Yes
Cloud infrastructure testing	Limited	Scope-dependent	Yes
Real-time findings	Yes	No	Yes
Compliance mapping	Limited	Manual	Yes
Triage burden	High	Low	Near-zero, proof-gated

11. Getting Started

Nebula is delivered as a SaaS platform with no on-premise infrastructure required, with private-cloud and on-premise options for teams that need them.


Rendering diagram

Day one: the first comprehensive scan completes, including any exploit chains already present before Nebula arrived.
Week one: integrations are live, the pipeline gates on critical findings, and alerts flow to your team and SIEM.
Month one: trend data accumulates, showing how the surface is changing and where new risk is being introduced.

Enterprise plans include dedicated security-engineering support, custom scope and scan configuration, white-label reporting, SSO and role-based access control, custom compliance mapping, and scheduled engagements with the Breachline red team. A self-service tier for smaller teams is on the roadmap.

12. Conclusion

The organizations that handle security well in 2026 are not the ones with the largest team or the most expensive annual test. They are the ones that made testing continuous, automated, and part of how they ship.

Nebula is built for that: a swarm of reasoning agents, with multi-tier intelligence and a memory that compounds, that reasons about your attack surface the way a strong human tester would, but never sleeps, never runs out of time before it reaches the last endpoint, and never hands you a finding it cannot prove. It is in Early Access today, and it is honest about the line between what it does now and what is on the roadmap.

The open question is no longer whether AI can do this work. It is how long an organization keeps relying on an approach that samples its surface a few times a year while attackers work the rest of it continuously.

Sources

OWASP Top 10, owasp.org/www-project-top-ten
MITRE ATT&CK, attack.mitre.org
NIST Cybersecurity Framework, nist.gov/cyberframework
PCI Security Standards Council, PCI DSS, pcisecuritystandards.org
IBM, Cost of a Data Breach Report, ibm.com/reports/data-breach
Verizon, Data Breach Investigations Report (DBIR), verizon.com/business/resources/reports/dbir

About Breachline Labs

Breachline Labs Limited builds autonomous AI security platforms that find and prove vulnerabilities before attackers do. Headquartered in London, United Kingdom, Breachline pairs the Nebula platform with a UK red team for expert-led engagements.

Get started:

Website: breachline.io
Request a demo: breachline.io/contact
Enterprise inquiries: sales@breachline.io
Security research: research@breachline.io