maint: patch sources, engine hardening, proxy update for v1.0.1

This commit is contained in:
nox-project
2026-04-13 21:42:08 +02:00
parent a166ce411d
commit 1612a7ef48
12 changed files with 121 additions and 185 deletions
+5 -5
View File
@@ -17,7 +17,7 @@
[![Kali Linux](https://img.shields.io/badge/Kali%20Linux-Ready-557C94?logo=kalilinux&logoColor=white)](https://www.kali.org/) [![Kali Linux](https://img.shields.io/badge/Kali%20Linux-Ready-557C94?logo=kalilinux&logoColor=white)](https://www.kali.org/)
[![BlackArch](https://img.shields.io/badge/BlackArch-Available-1E1E2E?logo=archlinux&logoColor=white)](https://blackarch.org/) [![BlackArch](https://img.shields.io/badge/BlackArch-Available-1E1E2E?logo=archlinux&logoColor=white)](https://blackarch.org/)
[![Platform](https://img.shields.io/badge/Platform-Linux%20%7C%20macOS%20%7C%20Windows-lightgrey)](https://github.com/nox-project/nox-framework) [![Platform](https://img.shields.io/badge/Platform-Linux%20%7C%20macOS%20%7C%20Windows-lightgrey)](https://github.com/nox-project/nox-framework)
[![Sources](https://img.shields.io/badge/Sources-126-red)](https://github.com/nox-project/nox-framework) [![Sources](https://img.shields.io/badge/Sources-123-red)](https://github.com/nox-project/nox-framework)
*OSINT framework for red teaming, digital forensics, and corporate exposure analysis.* *OSINT framework for red teaming, digital forensics, and corporate exposure analysis.*
@@ -31,7 +31,7 @@ NOX is a purpose-built cyber threat intelligence engine designed for operators w
| Capability | Detail | | Capability | Detail |
|-|-| |-|-|
| ⚡ **Async Execution Engine** | Massively parallel scanning across 126 intelligence feeds with no sequential bottlenecks and no blocking I/O. | | ⚡ **Async Execution Engine** | Massively parallel scanning across 123 intelligence feeds with no sequential bottlenecks and no blocking I/O. |
| 🛡️ **Guardian Engine** | Integrated OPSEC layer with automatic proxy rotation and SOCKS5 support. Fail-safe kill-switch halts all traffic if the transport circuit is unavailable. | | 🛡️ **Guardian Engine** | Integrated OPSEC layer with automatic proxy rotation and SOCKS5 support. Fail-safe kill-switch halts all traffic if the transport circuit is unavailable. |
| 🧠 **Risk Scoring** | Dynamic 0100 scoring with time-decay, source confidence weighting, password complexity analysis, persistence multipliers, and HVT detection. | | 🧠 **Risk Scoring** | Dynamic 0100 scoring with time-decay, source confidence weighting, password complexity analysis, persistence multipliers, and HVT detection. |
| 🔗 **Recursive Avalanche Engine** | Every discovered asset — username, email, cracked password, phone — is automatically re-injected as a new scan seed. Per-asset pipeline runs sequentially (breach → crack → dork → scrape); child assets run concurrently. Identifiers from all four phases feed the pivot queue. Global deduplication and configurable depth cap prevent runaway recursion. | | 🔗 **Recursive Avalanche Engine** | Every discovered asset — username, email, cracked password, phone — is automatically re-injected as a new scan seed. Per-asset pipeline runs sequentially (breach → crack → dork → scrape); child assets run concurrently. Identifiers from all four phases feed the pivot queue. Global deduplication and configurable depth cap prevent runaway recursion. |
@@ -43,7 +43,7 @@ NOX is a purpose-built cyber threat intelligence engine designed for operators w
| Feature | Description | | Feature | Description |
|-|-| |-|-|
| **126 JSON Plugin Sources** | Every intelligence source is a JSON plugin. The execution engine contains zero hardcoded source logic. | | **123 JSON Plugin Sources** | Every intelligence source is a JSON plugin. The execution engine contains zero hardcoded source logic. |
| **Async Core** | Full `asyncio` event loop with JA3 fingerprinting, SSL session management, per-request jitter, and configurable concurrency. | | **Async Core** | Full `asyncio` event loop with JA3 fingerprinting, SSL session management, per-request jitter, and configurable concurrency. |
| **Autoscan Pipeline** | `--autoscan` triggers: breach scan → recursive pivot → Google/Bing/SearXNG dorking → paste/Telegram scraping — all in one command. | | **Autoscan Pipeline** | `--autoscan` triggers: breach scan → recursive pivot → Google/Bing/SearXNG dorking → paste/Telegram scraping — all in one command. |
| **Recursive Avalanche Engine** | Every identifier discovered — from breach records, dork hits, or scraped paste/Telegram content — is re-injected as a new seed. Per-asset pipeline is sequential (breach → crack → dork → scrape); child assets run concurrently via `asyncio.gather`. A global `seen_assets` set prevents infinite loops. Concurrency and depth are fully configurable at runtime via `--threads` and `--depth`. | | **Recursive Avalanche Engine** | Every identifier discovered — from breach records, dork hits, or scraped paste/Telegram content — is re-injected as a new seed. Per-asset pipeline is sequential (breach → crack → dork → scrape); child assets run concurrently via `asyncio.gather`. A global `seen_assets` set prevents infinite loops. Concurrency and depth are fully configurable at runtime via `--threads` and `--depth`. |
@@ -108,7 +108,7 @@ Supported fields: `name`, `endpoint`, `method`, `headers`, `regex_pattern` (or `
``` ```
For each asset (seed + every discovered identifier): For each asset (seed + every discovered identifier):
├─ Phase 1 — Breach Scan ├─ Phase 1 — Breach Scan
│ 126 sources queried in parallel (async) │ 123 sources queried in parallel (async)
├─ Phase 2 — Hash Crack (non-blocking, concurrent) ├─ Phase 2 — Hash Crack (non-blocking, concurrent)
│ Hashes found in breach data → rainbow-table APIs → cracked plaintext │ Hashes found in breach data → rainbow-table APIs → cracked plaintext
@@ -258,7 +258,7 @@ nox-cli --help
The post-install script automatically: The post-install script automatically:
1. Creates an isolated virtual environment at `/opt/nox-cli/.venv` 1. Creates an isolated virtual environment at `/opt/nox-cli/.venv`
2. Installs all Python dependencies inside the venv (PEP 668 compliant — zero system pollution) 2. Installs all Python dependencies inside the venv (PEP 668 compliant — zero system pollution)
3. Builds the 126 source plugins 3. Builds the 123 source plugins
4. Links `/usr/bin/nox-cli``/opt/nox-cli/nox-wrapper.sh` 4. Links `/usr/bin/nox-cli``/opt/nox-cli/nox-wrapper.sh`
### Option 2: From Source ### Option 2: From Source
+27 -32
View File
@@ -71,6 +71,11 @@ class SourceConfig(BaseModel):
backup_endpoints: List[str] = Field(default_factory=list) backup_endpoints: List[str] = Field(default_factory=list)
# H2: optional confidence override — when set, takes precedence over formula # H2: optional confidence override — when set, takes precedence over formula
confidence: Optional[float] = None confidence: Optional[float] = None
# Two-phase poll support (e.g. IntelX: POST → job_id → GET results)
poll_endpoint: Optional[str] = None
poll_id_field: Optional[str] = None
poll_id_param: Optional[str] = None
poll_json_root: Optional[str] = None
@field_validator("reliability_score") @field_validator("reliability_score")
@classmethod @classmethod
@@ -131,6 +136,10 @@ def _mk(
bypass_required: Optional[List[str]] = None, bypass_required: Optional[List[str]] = None,
user_agent_type: Optional[str] = None, user_agent_type: Optional[str] = None,
backup_endpoints: Optional[List[str]] = None, backup_endpoints: Optional[List[str]] = None,
poll_endpoint: Optional[str] = None,
poll_id_field: Optional[str] = None,
poll_id_param: Optional[str] = None,
poll_json_root: Optional[str] = None,
) -> SourceConfig: ) -> SourceConfig:
return SourceConfig( return SourceConfig(
name=name, category=category, endpoint=endpoint, method=method, name=name, category=category, endpoint=endpoint, method=method,
@@ -150,6 +159,10 @@ def _mk(
bypass_required=bypass_required or None, bypass_required=bypass_required or None,
user_agent_type=user_agent_type, user_agent_type=user_agent_type,
backup_endpoints=backup_endpoints or [], backup_endpoints=backup_endpoints or [],
poll_endpoint=poll_endpoint,
poll_id_field=poll_id_field,
poll_id_param=poll_id_param,
poll_json_root=poll_json_root,
) )
@@ -251,10 +264,12 @@ FREE_PUBLIC_SOURCES: List[SourceConfig] = [
_base("hudsonrock_osint", "breach_data", _base("hudsonrock_osint", "breach_data",
"https://cavalier.hudsonrock.com/api/json/v2/osint-tools/search-by-email?email={target}", "GET", "https://cavalier.hudsonrock.com/api/json/v2/osint-tools/search-by-email?email={target}", "GET",
{"stealers": "$.stealers"}, {"stealers": "$.stealers"},
rate_limit=5.0,
headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/135.0.0.0 Safari/537.36"},
input_type="email", output_type=["email", "domain", "username"], input_type="email", output_type=["email", "domain", "username"],
normalization_map={"stealers": "breach_record"}, normalization_map={"stealers": "breach_record"},
tags=["passive", "stealth"], tags=["passive", "stealth"],
health_check_url="https://cavalier.hudsonrock.com", reliability_score=4), health_check_url="https://cavalier.hudsonrock.com", reliability_score=3),
_base("ipinfo_io", "geolocation", _base("ipinfo_io", "geolocation",
"https://ipinfo.io/{target}/json", "GET", "https://ipinfo.io/{target}/json", "GET",
@@ -459,32 +474,15 @@ FREE_PUBLIC_SOURCES: List[SourceConfig] = [
tags=["passive"], tags=["passive"],
health_check_url="https://packetstormsecurity.com", reliability_score=4), health_check_url="https://packetstormsecurity.com", reliability_score=4),
_base("scylla_sh_search", "breaches",
"https://scylla.so/search?q={target}", "GET",
{"results": "$.*"},
input_type="email", output_type=["email", "domain"],
tags=["passive", "stealth"],
health_check_url="https://scylla.so", reliability_score=2, is_volatile=True,
bypass_required=["cloudflare"], user_agent_type="browser",
backup_endpoints=["https://scylla.so/api/search?q={target}"]),
_base("vigilante_pw", "breaches",
"https://vigilante.pw/api/search?q={target}", "GET",
{"results": "$.results"},
input_type="email", output_type=["email"],
tags=["passive", "stealth"],
health_check_url="https://vigilante.pw", reliability_score=2, is_volatile=True),
# ── New free sources (v1.0.1) ─────────────────────────────────────────────
_base("proxynova_comb", "breaches", _base("proxynova_comb", "breaches",
"https://api.proxynova.com/comb?query={target}", "GET", "https://api.proxynova.com/comb?query={target}", "GET",
{"lines": "$.lines"}, {"lines": "$.lines"},
input_type="email", output_type=["email"], input_type="email", output_type=["email"],
normalization_map={"lines": "credential_line"}, normalization_map={"lines": "credential_line"},
tags=["passive", "stealth"], tags=["passive", "stealth"],
health_check_url="https://api.proxynova.com", health_check_url="https://api.proxynova.com", reliability_score=3, is_volatile=True),
reliability_score=3, is_volatile=True),
# ── New free sources (v1.0.1) ─────────────────────────────────────────────
_base("shodan_internetdb", "scanners", _base("shodan_internetdb", "scanners",
"https://internetdb.shodan.io/{target}", "GET", "https://internetdb.shodan.io/{target}", "GET",
@@ -854,6 +852,10 @@ AUTHENTICATED_PREMIUM_SOURCES += [
payload_template={"term": "{target}", "buckets": [], "lookuplevel": 0, payload_template={"term": "{target}", "buckets": [], "lookuplevel": 0,
"maxresults": 100, "timeout": 0, "datefrom": "", "dateto": "", "maxresults": 100, "timeout": 0, "datefrom": "", "dateto": "",
"sort": 4, "media": 0, "terminate": []}, "sort": 4, "media": 0, "terminate": []},
poll_endpoint="https://2.intelx.io/intelligent/search/result",
poll_id_field="id",
poll_id_param="id",
poll_json_root="records",
tags=["passive", "stealth"], tags=["passive", "stealth"],
health_check_url="https://2.intelx.io", reliability_score=5), health_check_url="https://2.intelx.io", reliability_score=5),
@@ -925,23 +927,16 @@ AUTHENTICATED_PREMIUM_SOURCES += [
tags=["passive", "stealth"], tags=["passive", "stealth"],
health_check_url="https://api.flare.io", reliability_score=4), health_check_url="https://api.flare.io", reliability_score=4),
_base("leak_lookup", "breaches", _auth("leak_lookup", "breaches",
"https://leak-lookup.com/api/search", "POST", "https://leak-lookup.com/api/search", "POST",
{"results": "$.message"}, {"results": "$.message"},
headers={"X-API-Key": "{LEAK_LOOKUP_API_KEY}"},
api_key_slots=["{LEAK_LOOKUP_API_KEY}"],
input_type="email", output_type=["email"], input_type="email", output_type=["email"],
payload_template={"query": "{target}", "type": "email_address"}, payload_template={"query": "{target}", "type": "email_address"},
tags=["passive", "stealth"], tags=["passive", "stealth"],
health_check_url="https://leak-lookup.com", reliability_score=3, is_volatile=True), health_check_url="https://leak-lookup.com", reliability_score=3, is_volatile=True),
_auth("cit0day", "breaches",
"https://cit0day.in/api/v1/search?query={target}", "GET",
{"results": "$.results"},
headers={"Authorization": "Bearer {CIT0DAY_API_KEY}"},
api_key_slots=["{CIT0DAY_API_KEY}"],
input_type="email", output_type=["email"],
tags=["passive", "stealth"],
health_check_url="https://cit0day.in", reliability_score=2, is_volatile=True),
# ── DNS Recon ───────────────────────────────────────────────────────────── # ── DNS Recon ─────────────────────────────────────────────────────────────
_auth("securitytrails_sub", "dns_recon", _auth("securitytrails_sub", "dns_recon",
@@ -1154,7 +1149,7 @@ AUTHENTICATED_PREMIUM_SOURCES += [
api_key_slots=["{TWITTER_BEARER_TOKEN}"], api_key_slots=["{TWITTER_BEARER_TOKEN}"],
input_type="username", output_type=["username"], input_type="username", output_type=["username"],
tags=["passive"], tags=["passive"],
health_check_url="https://api.twitter.com", reliability_score=4), health_check_url="https://api.twitter.com", reliability_score=1),
_auth("github_code_search", "code", _auth("github_code_search", "code",
"https://api.github.com/search/code?q={target}", "GET", "https://api.github.com/search/code?q={target}", "GET",
+67 -42
View File
@@ -2105,12 +2105,8 @@ class ProxyManager:
sys.exit(1) sys.exit(1)
_PROXY_SOURCES = [ _PROXY_SOURCES = [
( "https://api.proxyscrape.com/v3/free-proxy-list/get?request=displayproxies&protocol=http&timeout=5000&proxy_format=protocolipport&format=text",
"https://api.proxyscrape.com/v2/" "https://raw.githubusercontent.com/proxifly/free-proxy-list/main/proxies/protocols/http/data.txt",
"?request=displayproxies&protocol=http&timeout=5000"
"&country=all&ssl=all&anonymity=all"
),
"https://www.proxy-list.download/api/v1/get?type=http&anon=elite",
"https://raw.githubusercontent.com/TheSpeedX/PROXY-List/master/http.txt", "https://raw.githubusercontent.com/TheSpeedX/PROXY-List/master/http.txt",
] ]
@@ -2225,6 +2221,7 @@ class DorkingEngine(Src):
self._dead_proxies: set = set() self._dead_proxies: set = set()
self._proxy_index: int = 0 self._proxy_index: int = 0
self.proxies = ProxyManager.get_proxies() self.proxies = ProxyManager.get_proxies()
self._dead_instances: set = set()
def _get_next_proxy(self) -> Optional[str]: def _get_next_proxy(self) -> Optional[str]:
live = [p for p in self.proxies if p not in self._dead_proxies] live = [p for p in self.proxies if p not in self._dead_proxies]
@@ -2294,7 +2291,11 @@ class DorkingEngine(Src):
from aiohttp_socks import ProxyConnector as _ProxyConnector from aiohttp_socks import ProxyConnector as _ProxyConnector
except ImportError: except ImportError:
_ProxyConnector = None _ProxyConnector = None
instance = random.choice(_SEARX_INSTANCES) live_instances = [i for i in _SEARX_INSTANCES if i not in self._dead_instances]
if not live_instances:
self._dead_instances.clear()
live_instances = list(_SEARX_INSTANCES)
instance = random.choice(live_instances)
url = f"{instance}/search?q={urllib.parse.quote(query)}&format=json&categories=general" url = f"{instance}/search?q={urllib.parse.quote(query)}&format=json&categories=general"
proxy = self._get_next_proxy() proxy = self._get_next_proxy()
try: try:
@@ -2306,6 +2307,7 @@ class DorkingEngine(Src):
async with sess.get(url, headers=_random_headers(), async with sess.get(url, headers=_random_headers(),
timeout=aiohttp_mod.ClientTimeout(total=12)) as resp: timeout=aiohttp_mod.ClientTimeout(total=12)) as resp:
if resp.status != 200: if resp.status != 200:
self._dead_instances.add(instance)
if proxy: if proxy:
self._dead_proxies.add(proxy) self._dead_proxies.add(proxy)
return [] return []
@@ -2316,6 +2318,7 @@ class DorkingEngine(Src):
if r.get("url") if r.get("url")
] ]
except Exception: except Exception:
self._dead_instances.add(instance)
if proxy: if proxy:
self._dead_proxies.add(proxy) self._dead_proxies.add(proxy)
return [] return []
@@ -2452,43 +2455,19 @@ class DorkEngine:
def _search(self, query: str, engine: str) -> List[dict]: def _search(self, query: str, engine: str) -> List[dict]:
hits = [] hits = []
try: try:
urls = { # Direct Google/Bing HTML scraping is blocked by CAPTCHA/consent walls
"google": f"https://www.google.com/search?q={urllib.parse.quote(query)}&num=10", # since 2024. Route all engines through SearXNG JSON API.
"bing": f"https://www.bing.com/search?q={urllib.parse.quote(query)}&count=10", url = f"{random.choice(_SEARX_INSTANCES)}/search?q={urllib.parse.quote(query)}&format=json&categories=general"
"ddg": f"{random.choice(_SEARX_INSTANCES)}/search?q={urllib.parse.quote(query)}&format=json&categories=general", resp = self.s.get(url, timeout=15, use_cloudscraper=False)
}
use_cs = engine != "ddg" # SearXNG is a plain JSON API — no cloudscraper needed
resp = self.s.get(urls.get(engine, urls["google"]), timeout=15, use_cloudscraper=use_cs)
if not resp.ok: if not resp.ok:
return hits return hits
# DDG/SearXNG returns JSON data = resp.json()
if engine == "ddg": for r in data.get("results", [])[:10]:
try: if r.get("url"):
data = resp.json()
for r in data.get("results", [])[:10]:
if r.get("url"):
hits.append({"title": r.get("title", ""), "url": r["url"], "snippet": r.get("content", "")})
except Exception:
pass
return hits
if not BeautifulSoup:
return hits
soup = BeautifulSoup(resp.text, "html.parser")
selectors = {
"google": ("div.g", "h3", "a[href]", ".VwiC3b"),
"bing": ("li.b_algo", "h2", "a", ".b_caption p"),
}
container, title_sel, link_sel, snippet_sel = selectors.get(engine, selectors["google"])
for item in soup.select(container)[:10]:
title_el = item.select_one(title_sel)
link_el = item.select_one(link_sel)
snip_el = item.select_one(snippet_sel)
if title_el:
url = link_el.get("href","") if link_el else ""
hits.append({ hits.append({
"title": title_el.get_text().strip(), "title": r.get("title", ""),
"url": url if url.startswith("http") else "", "url": r["url"],
"snippet": snip_el.get_text().strip() if snip_el else "", "snippet": r.get("content", ""),
}) })
except Exception: except Exception:
pass pass
@@ -6409,6 +6388,39 @@ class NoxSourceProvider(FileSystemProvider):
if status not in range(200, 300) or not text: if status not in range(200, 300) or not text:
return [] return []
# Two-phase poll: if poll_endpoint is defined, treat the first response
# as a job submission, extract the job ID via poll_id_field, then poll
# poll_endpoint?<poll_id_param>=<id> until results arrive.
poll_endpoint = d.get("poll_endpoint", "")
if poll_endpoint:
try:
job_id = json.loads(text).get(d.get("poll_id_field", "id"))
except Exception:
job_id = None
if not job_id:
return []
poll_param = d.get("poll_id_param", "id")
poll_root = d.get("poll_json_root", d.get("json_root", ""))
poll_url = f"{poll_endpoint}?{poll_param}={job_id}"
delay = 2
for _ in range(4):
await asyncio.sleep(delay)
p_status, p_text, _ = await self._get(session, poll_url, headers=hdrs)
if p_status not in range(200, 300) or not p_text:
delay = min(delay * 2, 16)
continue
try:
items = json.loads(p_text)
for key in (poll_root.split(".") if poll_root else []):
if isinstance(items, dict):
items = items.get(key, [])
if isinstance(items, list) and items:
return self._by_json(p_text, poll_root, d.get("field_map", {}))
except Exception:
pass
delay = min(delay * 2, 16)
return []
regex = d.get("regex_pattern", "") regex = d.get("regex_pattern", "")
if regex: if regex:
return self._by_regex(text, regex) return self._by_regex(text, regex)
@@ -6528,8 +6540,15 @@ class SourceOrchestrator:
"payload": raw.get("payload_template") or raw.get("payload") or {}, "payload": raw.get("payload_template") or raw.get("payload") or {},
# Pass resolved slot keys so FileSystemProvider can use them # Pass resolved slot keys so FileSystemProvider can use them
"_slot_keys": slot_keys, "_slot_keys": slot_keys,
# Two-phase poll support
"poll_endpoint": raw.get("poll_endpoint", ""),
"poll_id_field": raw.get("poll_id_field", "id"),
"poll_id_param": raw.get("poll_id_param", "id"),
"poll_json_root": raw.get("poll_json_root", ""),
} }
sources.append(NoxSourceProvider(self._sem, self._db, self._config, defn)) inst = NoxSourceProvider(self._sem, self._db, self._config, defn)
inst._bypass_required = raw.get("bypass_required") or []
sources.append(inst)
logger.debug("SourceOrchestrator: loaded %s", jf.name) logger.debug("SourceOrchestrator: loaded %s", jf.name)
except Exception as exc: except Exception as exc:
logger.warning("SourceOrchestrator: failed %s%s", jf.name, exc) logger.warning("SourceOrchestrator: failed %s%s", jf.name, exc)
@@ -6558,8 +6577,14 @@ class SourceOrchestrator:
def get_sources(self, session: "Session", qtype: str) -> List[AsyncSource]: def get_sources(self, session: "Session", qtype: str) -> List[AsyncSource]:
"""Return plugin sources applicable to qtype, pre-filtered to avoid creating unnecessary tasks.""" """Return plugin sources applicable to qtype, pre-filtered to avoid creating unnecessary tasks."""
self._ensure_loaded() self._ensure_loaded()
# curl_cffi presence cached in OPTIONAL after first _try_import call
_has_cffi = "curl_cffi" in OPTIONAL or _try_import("curl_cffi") is not None
sources: List[AsyncSource] = [] sources: List[AsyncSource] = []
for src in self._nox_sources: for src in self._nox_sources:
bypass = getattr(src, "_bypass_required", []) or []
if "cloudflare" in bypass and not _has_cffi:
logger.debug("Skipping %s — cloudflare bypass required, curl_cffi absent", src.name)
continue
input_type = getattr(src, "_input_type", "") input_type = getattr(src, "_input_type", "")
if not input_type or input_type == "any" or not qtype or input_type == qtype: if not input_type or input_type == "any" or not qtype or input_type == qtype:
sources.append(src) sources.append(src)
+1 -1
View File
@@ -10,7 +10,7 @@ brotli>=1.1.0 # brotli decompression for aiohttp br responses
zstandard>=0.23.0 # zstd decompression for aiohttp zstd responses (Cloudflare/Fastly CDNs) zstandard>=0.23.0 # zstd decompression for aiohttp zstd responses (Cloudflare/Fastly CDNs)
# ── Intelligence & Scraping ──────────────────────────────────────────── # ── Intelligence & Scraping ────────────────────────────────────────────
requests>=2.31.0 requests>=2.32.3
certifi>=2024.2.2 # up-to-date CA bundle for SSL verification certifi>=2024.2.2 # up-to-date CA bundle for SSL verification
cloudscraper>=1.2.71 # Cloudflare-protected endpoint bypass cloudscraper>=1.2.71 # Cloudflare-protected endpoint bypass
beautifulsoup4>=4.12.3 beautifulsoup4>=4.12.3
-32
View File
@@ -1,32 +0,0 @@
{
"name": "cit0day",
"category": "breaches",
"endpoint": "https://cit0day.in/api/v1/search?query={target}",
"method": "GET",
"requires_auth": true,
"selectors": {
"results": "$.results"
},
"rate_limit": 1.0,
"headers": {
"Authorization": "Bearer {CIT0DAY_API_KEY}"
},
"api_key_slots": [
"{CIT0DAY_API_KEY}"
],
"input_type": "email",
"output_type": [
"email"
],
"normalization_map": {},
"tags": [
"passive",
"stealth"
],
"health_check_url": "https://cit0day.in",
"expected_status": 200,
"reliability_score": 2,
"is_volatile": true,
"backup_endpoints": [],
"confidence": 0.55
}
+2 -1
View File
@@ -94,7 +94,6 @@ SERVICE_REGISTRY: Dict[str, Dict] = {
"GOOGLE_CX_KEY": {"display": "Google Custom Search (API key)", "public": False}, "GOOGLE_CX_KEY": {"display": "Google Custom Search (API key)", "public": False},
"GOOGLE_CX_ID": {"display": "Google Custom Search (CX ID)", "public": False}, "GOOGLE_CX_ID": {"display": "Google Custom Search (CX ID)", "public": False},
"GREYNOISE_API_KEY": {"display": "GreyNoise", "public": False}, "GREYNOISE_API_KEY": {"display": "GreyNoise", "public": False},
"HASHES_API_KEY": {"display": "Hashes.org", "public": False},
"HIBP_API_KEY": {"display": "HaveIBeenPwned", "public": False}, "HIBP_API_KEY": {"display": "HaveIBeenPwned", "public": False},
"HIPPO_API_KEY": {"display": "EmailHippo", "public": False}, "HIPPO_API_KEY": {"display": "EmailHippo", "public": False},
"HUNTER_API_KEY": {"display": "Hunter.io", "public": False}, "HUNTER_API_KEY": {"display": "Hunter.io", "public": False},
@@ -147,6 +146,8 @@ SERVICE_REGISTRY: Dict[str, Dict] = {
"MALWAREBAZAAR_API_KEY": {"display": "MalwareBazaar (abuse.ch)", "public": False}, "MALWAREBAZAAR_API_KEY": {"display": "MalwareBazaar (abuse.ch)", "public": False},
"FULLHUNT_API_KEY": {"display": "FullHunt (attack surface)", "public": False}, "FULLHUNT_API_KEY": {"display": "FullHunt (attack surface)", "public": False},
"NETLAS_API_KEY": {"display": "Netlas.io (internet scanner)", "public": False}, "NETLAS_API_KEY": {"display": "Netlas.io (internet scanner)", "public": False},
# ── Added in v1.0.2 ───────────────────────────────────────────────
"LEAK_LOOKUP_API_KEY": {"display": "Leak-Lookup", "public": False},
} }
_PRIVATE_KEYS = {k: v for k, v in SERVICE_REGISTRY.items() if not v["public"]} _PRIVATE_KEYS = {k: v for k, v in SERVICE_REGISTRY.items() if not v["public"]}
+6 -4
View File
@@ -7,8 +7,10 @@
"selectors": { "selectors": {
"stealers": "$.stealers" "stealers": "$.stealers"
}, },
"rate_limit": 1.0, "rate_limit": 5.0,
"headers": {}, "headers": {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/135.0.0.0 Safari/537.36"
},
"api_key_slots": [], "api_key_slots": [],
"input_type": "email", "input_type": "email",
"output_type": [ "output_type": [
@@ -25,7 +27,7 @@
], ],
"health_check_url": "https://cavalier.hudsonrock.com", "health_check_url": "https://cavalier.hudsonrock.com",
"expected_status": 200, "expected_status": 200,
"reliability_score": 4, "reliability_score": 3,
"backup_endpoints": [], "backup_endpoints": [],
"confidence": 0.85 "confidence": 0.7
} }
+4
View File
@@ -40,5 +40,9 @@
"expected_status": 200, "expected_status": 200,
"reliability_score": 5, "reliability_score": 5,
"backup_endpoints": [], "backup_endpoints": [],
"poll_endpoint": "https://2.intelx.io/intelligent/search/result",
"poll_id_field": "id",
"poll_id_param": "id",
"poll_json_root": "records",
"confidence": 1.0 "confidence": 1.0
} }
+7 -3
View File
@@ -3,17 +3,21 @@
"category": "breaches", "category": "breaches",
"endpoint": "https://leak-lookup.com/api/search", "endpoint": "https://leak-lookup.com/api/search",
"method": "POST", "method": "POST",
"requires_auth": false, "requires_auth": true,
"selectors": { "selectors": {
"results": "$.message" "results": "$.message"
}, },
"rate_limit": 1.0, "rate_limit": 1.0,
"headers": {}, "headers": {
"X-API-Key": "{LEAK_LOOKUP_API_KEY}"
},
"payload_template": { "payload_template": {
"query": "{target}", "query": "{target}",
"type": "email_address" "type": "email_address"
}, },
"api_key_slots": [], "api_key_slots": [
"{LEAK_LOOKUP_API_KEY}"
],
"input_type": "email", "input_type": "email",
"output_type": [ "output_type": [
"email" "email"
-35
View File
@@ -1,35 +0,0 @@
{
"name": "scylla_sh_search",
"category": "breaches",
"endpoint": "https://scylla.so/search?q={target}",
"method": "GET",
"requires_auth": false,
"selectors": {
"results": "$.*"
},
"rate_limit": 1.0,
"headers": {},
"api_key_slots": [],
"input_type": "email",
"output_type": [
"email",
"domain"
],
"normalization_map": {},
"tags": [
"passive",
"stealth"
],
"health_check_url": "https://scylla.so",
"expected_status": 200,
"reliability_score": 2,
"is_volatile": true,
"bypass_required": [
"cloudflare"
],
"user_agent_type": "browser",
"backup_endpoints": [
"https://scylla.so/api/search?q={target}"
],
"confidence": 0.55
}
+2 -2
View File
@@ -24,7 +24,7 @@
], ],
"health_check_url": "https://api.twitter.com", "health_check_url": "https://api.twitter.com",
"expected_status": 200, "expected_status": 200,
"reliability_score": 4, "reliability_score": 1,
"backup_endpoints": [], "backup_endpoints": [],
"confidence": 0.85 "confidence": 0.4
} }
-28
View File
@@ -1,28 +0,0 @@
{
"name": "vigilante_pw",
"category": "breaches",
"endpoint": "https://vigilante.pw/api/search?q={target}",
"method": "GET",
"requires_auth": false,
"selectors": {
"results": "$.results"
},
"rate_limit": 1.0,
"headers": {},
"api_key_slots": [],
"input_type": "email",
"output_type": [
"email"
],
"normalization_map": {},
"tags": [
"passive",
"stealth"
],
"health_check_url": "https://vigilante.pw",
"expected_status": 200,
"reliability_score": 2,
"is_volatile": true,
"backup_endpoints": [],
"confidence": 0.55
}