NetDoctor - Deterministic Network Diagnostic Platform

⊘

No arbitrary CLI

Devices are queried only through a fixed catalog of safe, read-only intents. No user-typed command strings ever reach the wire.

∎

Evidence-first

Every finding carries provenance: which artifact, which line, which parsed field, which baseline value. No finding without evidence.

⌬

Offline core

The full diagnostic engine runs from uploaded files - no internet, no AI required. AI is an explanation layer, never truth.

≡

Deterministic

Same inputs, same outputs. Rules operate on normalized snapshots and derived facts, not on raw text grep.

What it does - today

A complete diagnostic loop, end to end.

Every feature listed below is implemented, tested and shipping.

Ingest

Upload bundles

Upload configs, show outputs and zipped bundles via drag-and-drop or paste. Filename detection, content sniffing and gzip storage keep artifacts traceable.

Ingest

Read-only SSH

Collect live device output through Scrapli async SSH with parallel collection: baseline, topology, full and troubleshoot profiles. Per-device locks, per-site concurrency caps, per-command retries.

Engine

Cisco parsers

42 structured parsers normalize Cisco IOS / IOS-XE state: config, inventory, VLANs, trunks, interfaces, CDP, LLDP, STP, routes, MAC, ARP, PoE, environment and SNMP.

Engine

Snapshot model

Parsed data is merged into a canonical JSON snapshot, separating configured and observed state with consistency flags when they disagree.

Engine

Derived facts

Role, uplink, stack, gateway and redundancy facts are computed once so every rule reads the same normalized model.

Rules

Baseline checks

Built-in and organization rules cover STP/VTP hygiene, VLANs, trunks, AAA, SNMP, NTP, DHCP snooping and DAI without dynamic eval.

Rules

Policy layers

Baselines merge built-in, global, environment, site, role and device layers, with per-rule overrides and a clear winning source.

Evidence

Provenance

Every finding carries artifact ID, parsed field, baseline source and predicate inputs, so reports can show exactly why it fired.

UI

Device detail

Per-device dashboard shows summary, findings, interfaces, VLANs, neighbors and raw artifacts with searchable operational context.

UI

Topology graph

Site topology renders role-based hierarchy, port-channels, clusters and focus paths, with neighbor links kept tied to evidence.

UI

Risk views

Cross-device findings are deduplicated by rule, device, interface and VLAN, then grouped into severity and critical-path views.

UI

Baseline UI

Browse and edit baseline policy in the UI, previewing which layer wins before a check is applied to a device.

AI

Optional AI

A local LLM offers plain-language commentary on findings - no data ever leaves the machine. Disabled by default, never a source of truth.

Ops

Job audit

Collection and analysis jobs track who ran what, profile, target, status and errors in a deterministic state machine.

Ops

Exports

CSV and JSON exports include findings and snapshots with Cisco-aware secret redaction before external sharing.

Engine

MAC Intelligence

Offline IEEE OUI database, vendor classification, MAC observation tracking, flap detection and rogue device analysis. Baseline-driven - vendor name alone never triggers high-severity findings.

Engine

Playbook engine

Step-by-step diagnostic playbooks mapped to findings. 10 playbooks with 118 individual checks covering port issues, VLAN, STP, EtherChannel, PoE, AP, TACACS and compliance.

Ops

Credential vault

SSH credentials encrypted at rest with AES-256-GCM, PBKDF2-SHA256 key derivation (100k iterations). Passwords never returned in API responses. Per-profile isolation.

Ops

Scheduled collection

Cron-based automated SSH collection with hierarchical targeting: global, country, site, sub-site or specific devices. 7 presets plus custom cron, with per-device locking and concurrency caps.

UI

Smart site mapping

Sites auto-positioned on a world map from hostname conventions and an offline city coordinate database. Golden-angle spiral offsets prevent overlap. Admins can drag markers to override.

Ops

Admin panel

7-tab admin dashboard: system health, RBAC with 4 roles and per-user permission overrides, credential vault, backup scheduler, security audit with forensic fingerprinting, and brute-force lockout.

UI

Live terminal

WebSocket-streamed SSH output during collection. Watch every command execute in real time, per device, with status indicators and per-command progress tracking.

Ops

Automated backups

Scheduled PostgreSQL backups via pg_dump with gzip compression. Configurable retention policy (default 30 days), backup history with size and age, and one-click manual backup.

Quality

Test suite

Golden fixtures and regression tests pin parser output, derived facts and rule behavior across 1 781 automated checks.

How it works

Five stages, deterministic, evidenced.

01

Ingest

Upload artifacts or run a read-only SSH collection profile against live devices. Files are deduplicated, gzipped, and detected by filename + content.

→

02

Parse

Each artifact runs through its dedicated parser. Outputs are dataclasses with explicit fields - never raw strings. Parser status: ok / partial / failed / empty / raw_only.

→

03

Normalise

Parsers feed the snapshot builder. Configured vs observed values merge into a canonical JSON model with consistency flags. Derived facts are computed once.

→

04

Evaluate

Deterministic rule predicates run against the snapshot, derived facts and 6-layer baseline. Each finding is built with its evidence payload.

→

05

Present

Dashboard, snapshot detail, topology graph, exports. AI explanations on demand for context - never replacing the deterministic verdict.

Topology

Your network, drawn from the wires up.

NetDoctor builds a live topology graph from CDP, LLDP and port-channel data without a single SNMP poll. Click any device, get evidence-anchored findings inline.

Click any device to inspect

router/fw core access endpoints Po aggregation CDP / LLDP / clusters

01

Hierarchy you can trust

Devices placed by inferred role (core / distribution / access / endpoint) using the same engine that powers the rules.

02

Port-channels, one logical link

Aggregated members deduplicated and rendered as parallel lines with the operational port-channel badge.

03

Endpoints classified offline

Phones, APs, cameras, HMIs, printers, servers - classified from CDP capabilities and the offline IEEE OUI database.

04

Findings overlaid on the graph

Severity counts inline. Click a device for full evidence, recommendation and impact for every finding.

05

Export as PNG or SVG

One-click high-resolution PNG export (3x scale) or clean SVG for documentation, change requests and management reports. Every export captures the current layout, collapsed state and finding badges exactly as displayed.

06

Interactive canvas

Pan, zoom (0.05x to 8x), drag individual nodes with optional grid snap. Multi-select with rubber-band selection. Collapse subtrees with the ± button. Save and load named layout profiles. Positions persist across sessions.

07

World map with site markers

Sites auto-positioned on a Leaflet/OSM map from hostname conventions and an offline city coordinate database. Pulsing markers, cluster grouping at low zoom, click to drill into the site graph. Admins can drag markers to override positions.

08

Ghost neighbors and clusters

CDP/LLDP neighbors without a local snapshot appear as ghost nodes (dashed outline). Endpoints with the same role are grouped into expandable clusters with a +N badge. Click a cluster to fan out individual members.

MAC Intelligence

Every MAC address tells a story. We read all of them.

39,201 IEEE OUI entries. 590 curated vendor overrides. 130+ classification patterns. Completely offline — no internet required, ever.

🔍

Vendor identification

Every MAC address is resolved against the full IEEE OUI registry — 39,201 entries compressed to 318 KB, loaded in ~80 ms. Two-tier architecture: 590 hand-curated vendor type overrides checked first, then auto-classification with confidence scoring from 130+ regex patterns. Cisco, Juniper, Arista, Fortinet, Hikvision, Yealink, Xerox and hundreds more — each with a device type and confidence score.

🏷️

Device type classification

12 canonical device types: network (switches, routers, firewalls, APs), phone (VoIP), printer, camera (IP/NVR), endpoint (laptops, desktops, mobiles), server, virtualization (VMware, Hyper-V, KVM), firewall, wireless_ap, iot (industrial, sensors, UPS) and more. Ambiguous vendors (Cisco = switch or phone? HP = printer or server?) return possible_types for downstream rules to refine using interface role, baseline and history.

🛡️

Rogue device detection

15 deterministic rules (ROGUE-001 through ROGUE-015) analyse every access port. Unauthorized mini-switches on user ports, network vendor MACs where only endpoints belong, unknown OUI on secured ports, MAC flapping from syslog, port-security violations and 802.1X/MAB failures. Vendor name alone never triggers a high-severity finding — every rule requires corroborating evidence from interface role, baseline and observed state.

🧠

Smart inference & downgrade

Not every multi-MAC access port is a rogue switch. The engine recognizes: phone + PC pairs (downgrades to info, recommends voice VLAN), wireless AP in bridge mode (AP + wireless clients on one port, ≤20 MACs), camera clusters (PoE camera switch / NVR uplink, ≥50% camera OUI), and multi-NIC servers (sequential same-OUI MACs within 8 addresses). Each suppression is explained and logged.

📍

Port movement & history

Every MAC is tracked per device: first seen, last seen, observation count, current interface, previous interfaces. When a MAC moves between ports on the same switch, the engine generates an alert with from/to interfaces and timestamp. Full history is persisted in PostgreSQL with configurable retention (90 days default).

⚡

Flap detection

Dual detection: MAC table analysis finds the same MAC learned on multiple interfaces simultaneously (confidence 0.90), and syslog parsing catches Cisco SW_MATM / MACFLAP messages with interface pairs (confidence 0.85). Virtual MACs are excluded: HSRP, VRRP, GLBP, STP, LLDP, LACP, multicast and broadcast.

🌐

Cross-site dual presence

Detects the same MAC address active at two or more sites simultaneously. Severity scales with the time gap: under 1 minute = critical (impossible travel / MAC spoofing), 1 hour = high, 6 hours = medium. Virtual MAC protocols and multicast are excluded. Site extraction uses hostname parsing — no hardcoded site names.

📊

Baseline-driven analysis

Every interface has role-based expectations: access ports allow 1 MAC, voice ports allow 2, trunks 256, uplinks unlimited. Organization baselines can override expected vendor types, maximum MAC counts and known MAC allowlists per interface. Violations are measured against the baseline, not against arbitrary thresholds. Uplinks, etherchannel members, AP trunks and server trunks are automatically excluded from rogue checks.

39,201IEEE OUI entries

15Rogue rules

130+Vendor patterns

0Internet required

Network Telemetry · In Development

Beyond SSH — continuous visibility without logging in.

SSH collects a point-in-time snapshot. Telemetry adds the dimension of time: real-time counters, async events, and change detection that triggers re-analysis automatically.

In Development

SNMPv3 Polling Collector

Authenticated, encrypted polling (SHA + AES-128) with configurable intervals. Interface counters (64-bit HC), CPU, memory, temperature, fan status, MAC table, ARP table, STP topology, VLANs, CDP/LLDP neighbors, port-security violations — all from standard and Cisco enterprise MIBs.

What it collects

IF-MIB counters IP-MIB / ARP BRIDGE-MIB / MAC table STP topology CISCO-CDP-MIB LLDP-MIB CISCO-PROCESS-MIB (CPU) CISCO-MEMORY-POOL-MIB CISCO-ENVMON-MIB ENTITY-MIB (inventory) CISCO-PORT-SECURITY-MIB CISCO-VTP-MIB

In Development

SNMP Trap & Inform Receiver

Async event receiver on UDP 162 / 1162. The device pushes events the moment they happen — no polling delay. Link up/down, cold start, config changes, STP topology changes, port-security violations, err-disable events. SNMPv3 informs with acknowledgement guarantee.

Events captured

linkDown / linkUp coldStart / warmStart authenticationFailure Config change STP topology change Port-security violation Err-disable VLAN membership change

In Development

Syslog Collector

UDP 514, TCP 514 and TLS 6514 receiver. Every switch and firewall already speaks syslog — no agent, no license. 8 severity levels from emergency to debug. Config changes, link events, STP reconvergence, port-security violations, DHCP snooping, ACL hits — all captured, parsed and correlated with device snapshots.

Correlation triggers

%SYS-5-CONFIG_I → re-collect config %LINK-3-UPDOWN → re-collect interfaces %PORT_SECURITY-2 → immediate finding %SPANTREE-5 → re-evaluate STP rules %SW_MATM → MAC flap detection

Planned

Event-Driven Re-Collection

The missing piece between polling and continuous assurance. When a trap or syslog event signals a meaningful change (config saved, link flap, STP reconvergence), NetDoctor automatically triggers a targeted SSH re-collection of the affected artifacts — and re-runs the rule engine. The finding appears in the dashboard within seconds of the event, not at the next scheduled poll.

Future

gRPC Model-Driven Telemetry

For IOS-XE 16.10+, NX-OS and IOS-XR: sub-second push-model streaming over HTTP/2 with Protocol Buffers. Interface counters every 1 second, CPU every 5 seconds, routing table changes in real time. No polling overhead, no SNMP limitations. The highest-fidelity data source for modern Cisco platforms.

Future

NETCONF / RESTCONF

Structured XML/JSON data over SSH (port 830) and HTTPS (port 443) using YANG models. Atomic reads with candidate configs, operational data stores, and event notifications. The programmatic alternative to CLI scraping — available on IOS-XE, FortiGate, Junos, Arista EOS and most modern platforms.

Design principle: Every telemetry source feeds the same normalized snapshot that powers the rule engine. SSH, SNMP, syslog and future gRPC data all converge into a single deterministic pipeline. No separate dashboards, no data silos — one engine, one truth.

Security model

Designed against the things that actually break networks.

A single typo in configuration mode can take an enterprise offline. That's why the tool has no configuration mode.

Things this tool will never do

Execute arbitrary CLI typed by a user
Enter configure terminal
Run write, reload, clear, erase, delete, format
Execute uncontrolled debug commands
Treat AI output as source of truth
Send unsanitised secrets to any external API
Generate findings without evidence
Treat missing data as healthy

What it actually does

Read-only intents from a fixed, audited catalog
Per-intent timeouts and risk classification
Cisco-aware redactor: enable secret, SNMP communities, TACACS keys, PSKs
Full audit trail (who/when/what/where) in PostgreSQL
JWT auth with scoped roles (superadmin, admin, operator, viewer)
Per-user granular permission overrides (7 permissions)
Credential vault with AES-256-GCM encryption at rest
Brute-force lockout (5 attempts / 5 min → 15-min lock)
Security audit with forensic fingerprinting
Findings carry provenance back to the originating line
Explicit "missing data" state - never silently healthy

Redaction surface

Before any artifact, finding or snapshot can leave the local perimeter (export, AI prompt, share link), the redactor strips:

enable secret / password username … secret snmp-server community / user tacacs-server key radius-server key key-string pre-shared-key crypto isakmp key vty / console password

Roadmap

From offline switch audit to full multi-vendor topology intelligence.

Built in order: engine → MAC intelligence → routing → cross-device path → FortiGate → AI explanations. Each phase ships with tests before the next begins.

01

Shipped

Cisco Switch Core

Offline upload & parse pipeline
Normalised Device Snapshot v2
23 Cisco show parsers
55 built-in switch rules
Derived facts engine
Evidence engine
6-layer baseline merge
Snapshot Detail dashboard

02

Shipped

Active Collection

Netmiko SSH collection
4 collection profiles
Per-intent timeouts & risk
Live WebSocket terminal
Credential vault
Job state machine + audit

03

Shipped

Org & UX

Organisation baselines
Custom baseline-driven rules
Topology graph (SVG)
Cross-device CDP / LLDP rules
Findings dedup & search
Stack detection (4-tier)
Re-analysis & incremental upload

04

Shipped

Playbooks (118 checks)

Diagnostic playbook engine
Finding-to-playbook mapping
Step-by-step remediation guides
AP port verification playbook
10 playbooks across 8 categories
Playbook audit trail

05

Shipped

MAC Intelligence

OUI database (offline IEEE registry)
Vendor lookup & classifier
MAC observation history
MAC flap detection
Rogue device analyzer (5-phase pipeline)
Baseline-driven, not vendor-name-only

06

Shipped

Cisco Routing

Route table parser (RIB)
Route candidate model + preference
BGP analyzer + RIB-failure rule
ARP analyzer
Next-hop / CEF validation
Causal chain builder

07

Shipped

Cross-Device Path

Path graph builder
End-to-end path tracer
Multi-hop forwarding validation
Cross-device evidence chains

08

Shipped

FortiGate

FortiGate knowledge model
Configuration parser
Policy & route table parsers
Built-in FortiGate rules
Cross-vendor topology

09

In Development

Network Telemetry

SNMPv3 polling collector (GET / WALK)
SNMP trap & inform receiver
Syslog collector (UDP / TCP / TLS)
Event-driven re-collection triggers
Real-time interface counters & health
Config change detection via traps

10

100 % local LLM - no data leaves the machine
Per-finding plain-language summaries
Cited evidence in every reply
AI as commentary, not verdict
No internet required

11

Future

Palo Alto Networks

PAN-OS XML config parser
Security & NAT policy analyzer
Zone & virtual-router model
Panorama device-group awareness
Built-in PAN-OS rules

12

Future

Juniper Junos

Set / hierarchical config parser
Operational-mode show parsers
OSPF & BGP analyzers
SRX policy & security-zone rules
Virtual-chassis stack detection

13

Future

Aruba & HPE

ArubaOS-CX & AOS-S parsers
HPE Comware (5900/5940) support
VSF / IRF stack detection
Aruba Central / mobility rules
Cross-vendor LLDP normalisation

14

Future

MikroTik & Ubiquiti

RouterOS export parser
UniFi controller integration
EdgeOS / EdgeRouter support
Wireless / mesh-aware rules
SMB & ISP-friendly profiles

15

Future

Vendor SDK & Extensibility

Public parser plugin API (Python)
Declarative rule DSL (YAML)
Custom snapshot adapters
Community vendor packs
Extreme, Dell, Brocade scaffolds
Per-org vendor enable/disable

← Drag or scroll to explore all phases →

Tech stack

Boring on purpose. Easy to operate.

Backend

Python 3.11+ · FastAPI · SQLAlchemy async · Alembic · Scrapli (async SSH) · pyATS-friendly parsers

Frontend

React 19 · Vite · TanStack Query · Tailwind CSS · TypeScript

Storage

PostgreSQL 16 · Redis 7 · Filesystem (gzip artifacts)

Deployment

Docker Compose · Single-binary friendly · Air-gapped friendly

Quality

1 781 unit / integration tests · Golden fixture tests · pytest

AI (optional)

Local LLM only · zero data egress · explanations only · never source of truth

FAQ

The questions network engineers actually ask.

Will it ever push a config change to a device?

No. There is no configuration mode and there are no write commands in the catalog. The platform is read-only by architecture, not by policy.

Does it require internet access?

No. The entire platform - including the optional AI explanation layer - runs locally. No data ever leaves the machine, no external API calls, no telemetry.

Can it run in an air-gapped environment?

Yes. Docker Compose deployment + offline OUI database + offline rule packs. No phone-home telemetry.

How do you handle false positives?

Rules read normalised facts, not raw text. Derived facts (interface role, management SVI, stack topology) cap most false-positive sources. Baselines override severities and thresholds at any of 6 layers.

What happens when an artifact is missing?

It is explicit: rules that need it are listed under blocked by missing data, with the exact command to collect it. Missing data is never treated as healthy.

Why not just use Gemini / GPT / an LLM for the whole thing?

Security risk. Sending device configs to a cloud LLM leaks topology, credentials and policy to a third party. NetDoctor uses a local LLM only - nothing leaves the machine. And AI is restricted to plain-language commentary; verdicts always come from deterministic rules with cited evidence.

What vendors are supported?

Today: Cisco IOS / IOS-XE switches (L2 and L3) with full MAC intelligence and rogue device detection. Next: Cisco routers (RIB / BGP / CEF). Then: FortiGate firewalls, Palo Alto, Juniper Junos.

Where do organisation values come from?

Baseline files: built_in → global → environment → site → role → device. Never hardcoded in source. Every value emitted in evidence cites its baseline layer.

No arbitrary CLI

Evidence-first

Offline core

Deterministic

A complete diagnostic loop, end to end.

Upload bundles

Read-only SSH

Cisco parsers

Snapshot model

Derived facts

Baseline checks

Policy layers

Provenance

Device detail

Topology graph

Risk views

Baseline UI

Optional AI

Job audit

Exports

MAC Intelligence

Playbook engine

Credential vault

Scheduled collection

Smart site mapping

Admin panel

Live terminal

Automated backups

Test suite

Five stages, deterministic, evidenced.

Ingest

Parse

Normalise

Evaluate

Present

Your network, drawn from the wires up.

Hierarchy you can trust

Port-channels, one logical link

Endpoints classified offline

Findings overlaid on the graph

Export as PNG or SVG

Interactive canvas

World map with site markers

Ghost neighbors and clusters

Every MAC address tells a story. We read all of them.

Vendor identification

Device type classification

Rogue device detection

Smart inference & downgrade

Port movement & history

Flap detection

Cross-site dual presence

Baseline-driven analysis

Beyond SSH — continuous visibility without logging in.

SNMPv3 Polling Collector

What it collects

SNMP Trap & Inform Receiver

Events captured

Syslog Collector

Correlation triggers

Event-Driven Re-Collection

gRPC Model-Driven Telemetry

NETCONF / RESTCONF

Designed against the things that actually break networks.

Things this tool will never do

What it actually does

Redaction surface

From offline switch audit to full multi-vendor topology intelligence.

Cisco Switch Core

Active Collection

Org & UX

Playbooks (118 checks)

MAC Intelligence

Cisco Routing

Cross-Device Path

FortiGate

Network Telemetry

AI Explanations

Palo Alto Networks

Juniper Junos