PenQWEN
Domain-adapted LLM reducing security assessment setup from 4 hours to 12 minutes
●The Challenge
Penetration testing teams spend the first 4+ hours of every engagement on boilerplate reconnaissance: port scanning, service enumeration, vulnerability identification, and report scaffolding. Senior pentesters doing $200/hour work were wasting time on tasks that should be automated. General-purpose LLMs (GPT-4, Claude) produce plausible-looking but technically dangerous output—recommending tools that don't exist, generating commands with wrong flags, or suggesting techniques that violate scope agreements. The security domain requires extreme precision: a hallucinated Nmap flag could scan out-of-scope networks, and a fabricated CVE reference wastes hours of investigation time. No existing LLM solution understood OPSEC constraints, tool-specific syntax, or the structured methodology (PTES) that professional assessments follow.
●The Approach
Built a two-stage fine-tuning pipeline on Qwen2.5-7B. Stage one: cybersecurity corpus adaptation using 12GB of curated data—MITRE ATT&CK techniques, CVE databases, tool documentation (Nmap, Burp Suite, Metasploit, BloodHound), and penetration testing methodology guides. This gives the model domain vocabulary and factual grounding. Stage two: agentic fine-tuning for structured tool calling with OPSEC awareness. Trained on real engagement workflows to output properly formatted commands, respect scope constraints, and flag when a requested action might violate rules of engagement. Used LoRA (Low-Rank Adaptation) to keep adapter size at 3.6GB—practical for deployment on consumer GPUs.
●Tech Decisions
●Technical Challenges
●The Solution
PenQWEN deploys as a 3.6GB LoRA adapter on top of Qwen2.5-7B, runnable on any GPU with 12GB+ VRAM. The model handles reconnaissance automation, vulnerability prioritization, and report generation following PTES methodology. It generates syntactically correct tool commands with proper flags, understands scope constraints, and refuses to suggest techniques outside the defined engagement rules. The two-stage training approach means the model has both factual knowledge (CVEs, techniques, tool syntax) and procedural understanding (when to use which tool, how to chain findings, OPSEC considerations). Currently automating 60% of routine enumeration tasks with zero hallucinated commands in production use.
●Key Takeaways
●Related Projects
Have a similar challenge?
I help teams solve complex technical problems. Let's discuss your project.