Back

AI-Driven Threat Detection

I built a multi-model defense suite using FFNNs, GNNs, and transformers for malware, intrusion, and command-tactic detection.

Repository
MLPyTorchBERTGNNanomaly detectionmalware detectionintrusion detection

AI-Driven Threat Detection Research

Polito Logo

This repository serves as the central hub for the research and laboratory activities conducted during the AI and Cybersecurity course at Politecnico di Torino. It aggregates four distinct projects covering the intersection of Artificial Intelligence, Deep Learning, and Natural Language Processing applied to cybersecurity domains.

Repository Structure

The projects are organized sequentially to guide the learner from foundational network flow analysis to complex malware and anomaly detection tasks.

OrderLaboratoryTopicMethodsRepository Link
101_Network_Flow_AnalysisNetwork Flow ClassificationFeed-Forward Neural Networks (FFNNs), Weighted Loss, Feature Bias AnalysisLink
202_Malware_AnalysisMalware ClassificationDynamic API Analysis, RNNs (GRU/LSTM), Graph Neural Networks (GNNs)Link
303_Network_Anomaly_DetectionAnomaly Detection (NIDS)One-Class SVM, Autoencoders (Reconstruction Error), Unsupervised ClusteringLink
404_NLP_for_CybersecurityNLP for CybersecurityBash Command Classification, TF-IDF, Word2Vec, LSTMs for Tactics detectionLink

Note: Each folder above is a submodule linked to its respective standalone repository. To clone them all, use git clone --recursive.

Getting Started

To clone this repository along with all its submodules (laboratories), use the --recursive flag:

git clone --recursive https://github.com/RenatoMignone/AI-Driven-Threat-Detection-Research.git

If you have already cloned the repository without submodules, run:

git submodule update --init --recursive

Research Hub Overview

1. Network Flow Classification

Goal: Classify network flows (Benign vs Malicious) using Deep Learning on the CICIDS2017 dataset.

This module focuses on the full modelling pipeline for tabular network data. It investigates how Feed-Forward Neural Networks (FFNNs) perform against baselines and explores critical real-world issues like class imbalance and bias.

  • Key Experiments:
    • Baseline Models: Shallow FFNNs with varying architectures.
    • Bias Analysis: Investigating disparate impact of features like Destination Port.
    • Deep Learning: Deeper FFNNs (3-6 layers) with hyperparameter tuning.
    • Regularization: Application of Dropout, Batch Normalization, and Weight Decay.

2. Malware Analysis

Goal: Classify malware families based on dynamic API-call traces.

This project tackles the complexity of sequential data in malware analysis. By treating API call traces as sequences or graphs, we leverage advanced architectures to identify malicious behavior patterns.

  • Key Experiments:
    • Feature Exploration: Frequency-based analysis of API calls.
    • Sequence Modeling: Using LSTMs and GRUs to capture temporal dependencies in execution traces.
    • Graph Learning: Representing traces as graphs and applying GraphSAGE and GCNs to learn structural features of malware execution.

3. Network Anomaly Detection

Goal: Detect zero-day attacks in network traffic using unsupervised approaches.

Simulating a scenario where attack labels are unavailable, this lab applies unsupervised and semi-supervised learning to detect intrusions as anomalies.

  • Key Experiments:
    • Shallow Anomaly Detection: One-Class SVM (OC-SVM) for outlier detection.
    • Deep Anomaly Detection: Autoencoders that flag anomalies based on high reconstruction error.
    • Clustering: using DBSCAN and K-Means to group similar traffic patterns and isolate attacks.
    • Visualization: Dimensionality reduction with t-SNE and PCA.

4. NLP for Attack Tactic Recognition

Goal: Identify attacker intent (Tactics) from Bash command history.

Bridging the gap between Natural Language Processing (NLP) and cybersecurity, this module classifies raw bash commands into MITRE ATT&CK tactics (e.g., Discovery, Persistence, Execution).

  • Key Experiments:
    • Text Preprocessing: Custom tokenization for shell commands.
    • Embeddings: TF-IDF for keyword extraction and Word2Vec for semantic representation of commands.
    • Sequential Classification: Bidirectional LSTMs to understand the intent behind a sequence of commands.
    • Interpretability: Analyze confusion matrices to understand model decisions.

Authors & Contributors

This work is the result of a collaborative effort by the following team: