How Fimil Orchestrates 12+ Open-Source Security Scanners

When I tell people that Fimil runs 12+ security scanners on every repository, the first question is usually: “Doesn’t that produce an overwhelming number of results?” The short answer is no — and the reason comes down to how we orchestrate, normalize, and deduplicate across tools.

This post walks through the technical architecture behind Fimil’s scan pipeline, from the moment you trigger a scan to the moment prioritized findings appear in your dashboard.

The Scan Lifecycle

Every scan in Fimil follows a six-stage pipeline:

Clone → Detect → Scan → Normalize → Deduplicate → Prioritize
Each stage runs automatically with minimal configuration. Let's walk through them.

Stage 1: Clone

When a scan is triggered — either manually, via webhook on push, or on a schedule — Fimil clones the target repository into a temporary workspace. This clone is ephemeral: it exists only for the duration of the scan and is deleted immediately after.

For pull request scans, Fimil clones both the head branch and the base branch. This enables differential analysis later in the pipeline — separating new findings introduced in the PR from pre-existing issues in the codebase.

The clone happens over authenticated HTTPS using the OAuth tokens from your Git provider integration (GitHub, GitLab, or Bitbucket). Fimil supports repositories up to the limits of your Git provider.

Stage 2: Detect

Before running any scanners, Fimil’s Project Detector analyzes the cloned repository to understand what it contains. This stage answers three questions:

What languages and frameworks are present? — We look for manifest files (package.json, requirements.txt, go.mod, Cargo.toml, pom.xml, etc.), file extensions, and framework-specific config files.
Is there infrastructure-as-code? — We check for Terraform files (.tf), CloudFormation templates, Kubernetes manifests, Helm charts, Ansible playbooks, and Docker Compose files.
Are there container images to scan? — We look for Dockerfiles and extract image references from Kubernetes manifests and Docker Compose files.

The detection results determine which scanners will run. A Python web application with Terraform infrastructure will trigger Semgrep (SAST), Bandit (Python SAST), Trivy (SCA), Gitleaks (secrets), Checkov (IaC), and potentially Hadolint if Dockerfiles are present. A pure Go service without IaC might only trigger Semgrep, Gosec, Trivy, and Gitleaks.

This adaptive approach means you don’t need to configure which scanners to run — Fimil figures it out from your codebase. If you want more control, Scanner Profiles let you define reusable configurations that override the defaults.

Stage 3: Scan

This is where the actual security analysis happens. Each selected scanner runs in its own ephemeral Docker container with strict isolation:

┌─────────────────────────────────────────────┐
│  Host (Fimil Worker)                        │
│                                             │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐    │
│  │ Semgrep  │ │  Trivy   │ │ Gitleaks │    │
│  │          │ │          │ │          │    │
│  │ /scan    │ │ /scan    │ │ /scan    │    │
│  │ (ro)     │ │ (ro)     │ │ (ro)     │    │
│  │          │ │          │ │          │    │
│  │ /output  │ │ /output  │ │ /output  │    │
│  │ (rw)     │ │ (rw)     │ │ (rw)     │    │
│  └──────────┘ └──────────┘ └──────────┘    │
│   --net=none   --net=none   --net=none      │
└─────────────────────────────────────────────┘

Each container gets:

Read-only source mount at /scan — the scanner can read your code but can’t modify it
Writable output mount at /output — where the scanner writes its JSON results
No network access (--network=none) — scanners can’t phone home or exfiltrate data
Resource limits — CPU and memory caps prevent runaway processes from affecting other scans
Timeout enforcement — if a scanner hangs, the container is killed after a configurable timeout

This isolation model is a core security property of Fimil. Your source code is never exposed to the internet during scanning, and each scanner operates in its own sandbox. A compromised or misbehaving scanner image can’t affect other scanners, access the host system, or leak data.

Scanner Execution

Each scanner is implemented as a subclass of BaseScanner with three key methods:

should_run(project_info) — determines if this scanner is relevant for the detected project
get_docker_command(source_path, output_path) — generates the Docker command to execute
parse_output(output_path) — transforms the scanner’s native JSON output into Fimil’s internal RawFinding format

For example, here’s what the Semgrep scanner execution looks like conceptually:

Command: semgrep --config=auto --json -o /output/results.json /scan
Timeout: 600 seconds
Memory:  2GB max
CPU:     2 cores max
Network: none

The scanner writes its results to /output/results.json, and Fimil’s parser extracts findings in a normalized format.

Parallel vs Sequential Execution

Scanners run concurrently where possible. SAST scanners (Semgrep, Bandit, Gosec) analyze source code and can run in parallel with SCA scanners (Trivy, Grype) that analyze dependency manifests. Secrets scanners (Gitleaks, TruffleHog) also run independently.

The scan orchestrator manages this parallelism, coordinating container lifecycle and collecting results as each scanner completes.

Key security property: Your source code is never exposed to the internet during scanning. Each scanner runs in its own sandbox with no network access. A compromised scanner image can’t affect other scanners, access the host, or leak data.

Stage 4: Normalize

Different scanners produce wildly different output formats. Semgrep emits SARIF-like JSON with rule IDs and metavariable bindings. Trivy produces JSON with a nested vulnerability array per target. Gitleaks outputs a flat array of secret findings with commit-level metadata. Checkov emits passed/failed check results grouped by runner type.

Fimil’s normalization layer transforms all of these into a common internal representation with consistent fields:

Title — human-readable finding name
Description — detailed explanation of the issue
Severity — normalized to a five-level scale: Critical, High, Medium, Low, Info
Category — SAST, SCA, Secrets, IaC, or Container
File path and line number — where the issue was found
Tool — which scanner produced the finding
Rule ID — the scanner’s internal rule identifier
CVE/CWE IDs — when available
Package name and version — for SCA findings
Fingerprint — a content-based hash for deduplication (more on this below)

Severity normalization is particularly important. Semgrep uses ERROR/WARNING/INFO. Trivy uses CRITICAL/HIGH/MEDIUM/LOW/UNKNOWN. Checkov uses HIGH/MEDIUM/LOW. Gitleaks doesn’t assign severity at all (all secrets are treated as high severity). Fimil maps all of these to a consistent five-level scale using tool-specific logic, so you can meaningfully compare and filter findings across scanners.

Stage 5: Deduplicate

This is where the magic happens — and it’s the stage that delivers the 90% noise reduction we advertise.

The Duplication Problem

When you run 12 scanners on the same codebase, you get a lot of overlap:

Trivy and Grype both scan dependency manifests for known CVEs. They often find the exact same vulnerability in the exact same package version.
Trivy and OSV-Scanner pull from different vulnerability databases but frequently overlap on well-known CVEs.
Semgrep and Bandit both perform static analysis on Python code. Some findings (like use of eval() or weak cryptographic functions) will be flagged by both tools.
Gitleaks and TruffleHog both scan for hardcoded secrets. A leaked API key in your codebase will appear in both outputs.

Without deduplication, a team scanning a typical Node.js application might see 200 findings from Trivy, 180 from Grype, and 150 from OSV-Scanner — with 60-70% overlap between them. That’s over 500 findings when the real count of unique issues is closer to 150.

Content-Based Fingerprinting

Fimil generates a canonical fingerprint for each finding based on its essential properties — not its scanner-specific metadata. The fingerprint algorithm varies by finding category:

SCA findings: Fingerprint is based on CVE ID + package name + package version. This means the same CVE found by Trivy, Grype, and OSV-Scanner produces the same fingerprint.
SAST findings: Fingerprint is based on rule pattern + file path + code snippet hash. A SQL injection finding at the same location produces the same fingerprint regardless of whether Semgrep or Bandit found it.
Secrets findings: Fingerprint is based on secret type + partial secret hash + file path. The same leaked key found by Gitleaks and TruffleHog maps to one fingerprint.
IaC findings: Fingerprint is based on check ID + resource identifier + file path.

When multiple scanners produce findings with the same fingerprint, they’re merged into a single Finding in the database. The individual scanner results are preserved as FindingOccurrences — so you can always see which tools detected the issue — but the dashboard, API, and notifications treat it as one finding.

Cross-Tool Correlation

Beyond simple fingerprint matching, Fimil also correlates findings across categories. A FindingCorrelation links related findings — for example, a vulnerable dependency (SCA finding) that’s actually imported and used in a code path flagged for injection (SAST finding). These correlations help security teams understand the full risk picture rather than looking at isolated findings.

Stage 6: Prioritize

After deduplication, the remaining unique findings are scored using Fimil’s priority scoring algorithm. This is a weighted score from 0 to 100 that combines four signals:

Severity (60% weight)

The normalized severity of the finding is the strongest signal. Critical findings start with a high base score, Info findings start low. This is table stakes — every security tool does this.

EPSS Score (5% weight)

For findings with associated CVEs, Fimil fetches the Exploit Prediction Scoring System score from FIRST.org. EPSS predicts the probability that a vulnerability will be exploited in the wild within the next 30 days. A critical-severity CVE with a 0.1% EPSS score is very different from one with a 95% EPSS score — and your prioritization should reflect that.

EPSS scores are cached in Redis with a 24-hour TTL to avoid hammering the FIRST.org API.

Reachability (15% weight)

For SCA findings, Fimil performs reachability analysis by parsing your project’s lockfiles (package-lock.json, yarn.lock, poetry.lock, go.sum, Cargo.lock, and others). A vulnerability in a direct dependency that your code imports and calls is more urgent than one buried three levels deep in a transitive dependency that’s only used at build time.

Fimil classifies each SCA finding as either a direct or transitive dependency and adjusts the priority score accordingly.

Finding Age (20% weight)

Newly introduced vulnerabilities get a priority boost. A critical CVE that appeared in yesterday’s commit is more actionable than one that’s been sitting in your codebase for six months. The age signal encourages teams to address new issues before they become entrenched technical debt.

The Combined Score

The final priority score determines the order in which findings appear in your dashboard. The weighting is designed so that:

A critical-severity, high-EPSS, directly-reachable, recently-introduced vulnerability scores near 100
A low-severity, zero-EPSS, transitively-reachable, year-old finding scores near 0
Everything else falls on a spectrum between them

Both the score and the individual contributing factors are stored on each finding, so you can always understand why a finding is ranked where it is.

PR-Aware Scanning

When Fimil scans a pull request, the pipeline includes an additional comparison step. The scan runs on both the head branch (the PR) and the base branch (what you’re merging into), and findings are categorized as:

New — findings present in the head branch but not in the base branch (introduced by this PR)
Fixed — findings present in the base branch but not in the head branch (resolved by this PR)
Unchanged — findings present in both branches (pre-existing)

The math is simple: New = head - base. Fixed = base - head. Unchanged = intersection.

This differential view is critical for PR review workflows. A policy that says “no new critical vulnerabilities in PRs” can be enforced via Fimil’s PolicyEvaluator — it only evaluates the new findings, not the pre-existing ones. The result is reported as a GitHub Check Run, GitLab commit status, or Bitbucket build status, so developers get immediate feedback in their PR without leaving their Git provider.

What Happens to Your Code

I want to be explicit about the data lifecycle, because it matters:

Your source code is cloned into a temporary directory on the scan worker
Source code is mounted read-only into scanner containers
Scanners produce JSON output files (findings, not source code)
Source code is deleted immediately after all scanners complete
Only the normalized findings (metadata, not code) are stored in the database

Fimil never persists your source code. The clone exists only during the scan and is deleted as part of the orchestrator’s cleanup step. Scanner containers have no network access, so your code can’t be exfiltrated during the scan.

For teams that require even stronger guarantees, Fimil Enterprise runs entirely on your own infrastructure — source code never leaves your network.

Extending the Pipeline

The orchestration architecture is designed to be extensible. Adding a new scanner requires implementing three methods (should_run, get_docker_command, parse_output) and registering the scanner. The container isolation model means any tool that can run in Docker and produce JSON output can be integrated.

We’re continuously evaluating new open-source scanners to add to the pipeline. The community builds incredible security tools — our job is to make them work together.

Want to see this pipeline in action on your own repositories? Join the waitlist for early access, or check out the full feature list to see what’s included at each tier.