Skip to content

fimil pentest

An autonomous pentester that proves what it finds.

Fimil's agent attacks across 15 vectors, replays each exploit to confirm it, and exports the proof — PoC, curl repro, and an advisory fix PR. The pentest agent is in beta and is sharpest on APIs and server-rendered apps.

fimil pentest

Terminal transcript: fimil pentest --target https://staging.acme.dev. scope: staging.acme.dev, *.api.acme.dev · kill switch armed. discovery 142 routes · OpenAPI spec ingested. mfa login TOTP accepted · session established. testing IDOR on /api/v1/orders/{id} with session pair. candidate foreign record readable from session B. validator replay 2/2 OK → CONFIRMED. IDOR /api/v1/orders — PoC + curl repro exported. advisory fix PR opened → acme/api#214

how it works

Discover. Attack. Validate. Report.

Discovery maps the target with a BFS crawler, OpenAPI and GraphQL schema ingest, and a real headless browser for JS-rendered apps (SPA coverage is in beta). The agent loop then picks attack vectors and payloads adapted to what it found. Before anything is reported, a validator replays the candidate exploit — unconfirmed candidates never surface. What's left arrives with severity, compliance mappings, proof, and optionally a fix PR.

pentest-agent · run #db41 IDOR
  1. discover IDOR candidate: integer object id

    BFS crawl + OpenAPI ingest — 142 routes mapped

    GET /api/v1/orders/{id} → 200

  2. attempt

    Cross-account read with a second session

    order_id=1337 → 1338 · session B

  3. validate CONFIRMED: foreign record returned

    Exploit replayed before reporting — 2/2 reproductions

  4. proof advisory fix PR opened

    PoC + copy-paste reproduction exported

    curl -s 'https://staging.acme.dev/api/v1/orders/1338' -H 'Cookie: session=B···'

scope guard: 2 off-scope requests blocked · audit logged

coverage

Fifteen attack vectors.

Each with curated payload libraries and a vector-specific confirmation strategy — markers, out-of-band callbacks, or session pairs. Never report-by-vibes.

SQL injection

Curated payloads with marker-based confirmation, not error-message guessing.

Cross-site scripting

Reflected XSS, confirmed by proving the payload lands in an executable HTML context — not substring matching.

SSRF

Out-of-band callbacks prove the server actually fetched the URL.

IDOR

Cross-account session pairs read each other’s records to prove exposure.

Broken authorization

Role and privilege boundaries probed with real authenticated sessions.

Mass assignment

Over-posting hidden fields to mutate state the API never intended.

Prompt injection

Chat and structured-output injection for apps with LLM features.

SSTI

Server-side template injection with engine-specific payloads.

LDAP injection

Filter manipulation against directory-backed authentication.

XPath injection

Query manipulation against XML-backed data layers.

XXE

External entity resolution in XML parsers, proven out-of-band.

Command injection

OS command execution confirmed via marker output.

Path traversal

File-system escape attempts with platform-aware encodings.

JWT attacks

Algorithm confusion plus offline weak-HMAC cracking.

Insecure deserialization

Unsafe object deserialization across common frameworks.

authenticated testing

It logs in like a user — MFA included.

Credentialed scans drive a real browser through your login flow, including TOTP multi-factor auth. The captured session rides along on every request, so authorization and IDOR testing happens behind the login wall — where those bugs actually live.

mfa login

Terminal transcript: navigating to /login. credentials accepted. TOTP challenge detected. one-time code accepted · session captured. Cookie: session=eyJh··· merged into agent requests

containment

Autonomous, not unsupervised.

Every request the agent makes — including CSS, JS, and XHR fetched by its browser — passes through the scope guard. A hostname allowlist it cannot leave, destructive-verb gating, per-host rate limits, DNS pinning with RFC1918 and metadata-IP rejection, and a kill switch checked continuously. Blocked requests are aborted and audit-logged, not silently dropped.

Our own security posture
Scope guard

allowlist: staging.acme.dev, *.api.acme.dev

rate: ≤ 5 req/s per host

verbs: DELETE/PATCH gated · kill switch armed

dns: pinned · RFC1918 + metadata IPs rejected

GET staging.acme.dev/api/v1/orders
POST staging.acme.dev/api/v1/search
GET evil-cdn.example.com/payload.js

evidence

Every finding is a reproduction, not an opinion.

Proof is also the billing unit — you're only ever billed for confirmed findings. The agent is in beta; it is sharpest on APIs and server-rendered apps.

Proof of concept

The exact request, payload, and response evidence — sensitive values redacted.

curl reproduction

A copy-paste command your team can run to see the vulnerability themselves.

Replay log

When it was validated and how many reproductions succeeded before reporting.

Findings also carry per-vector control mappings for SOC 2 and PCI-DSS — mappings of findings to frameworks for your audit evidence, not certifications of Fimil itself.

close the loop

From confirmed finding to fix PR.

Confirmed findings can automatically open an advisory fix PR in your repo — the PoC, the curl reproduction, and remediation guidance, delivered where your team already works. Mark a finding false-positive and the open PR closes itself (and the charge reverses).

Open fix(deps): bump lodash 4.17.20 → 4.17.21
fimil/fix-CVE-2021-23337 main
package.json
"dependencies": {
"express": "^4.21.2",
"lodash": "4.17.20",
"lodash": "4.17.21",
"pino": "^9.6.0"
}
fimil-bot All checks passed Closes #482

faq

Questions you should be asking.

Is it safe to point at my environment?

The agent runs inside a scope guard: a hostname allowlist it cannot leave, destructive verbs (DELETE/PATCH) gated off by default, per-host rate limits, DNS pinning with RFC1918 and metadata-IP rejection, and a kill switch that halts the run immediately. We still recommend starting with staging — it is a real attacker, just a leashed one.

What does it need to start?

A target URL. Optionally: credentials (with a TOTP seed if the login has MFA) for authenticated testing, and an OpenAPI or GraphQL schema to accelerate discovery. Policy setup takes a few minutes.

How is it billed?

Usage-based on paid plans (Team and Business; not available on Free): you pay per confirmed finding. If a confirmed finding is later overturned as a false positive, the charge is reversed and credited automatically. The agent is in beta and is sharpest on APIs and server-rendered apps — discovery quality on SPA and auth-gated targets is still improving.

What counts as a “confirmed” finding?

A validator replays the candidate exploit against the target before anything is reported. Only findings that reproduce are confirmed — everything else is discarded, not shown with a “maybe” label.

Can I stop a run mid-flight?

Yes. The kill switch is checked continuously during the run and halts all agent traffic immediately. It is available on every plan, always free.

Point it at your staging environment.

Set a policy, define the scope, and let the agent show you what it can prove. Usage-based on Team and Business plans — you pay per confirmed finding. Agent is in beta; sharpest on APIs and server-rendered apps.