The SHIELD Project: Transforming IoT Data into AI-Ready Cybersecurity Insights

In the rapidly evolving landscape of cybersecurity, the quality of data used to train AI models can mean the difference between detecting a threat in time and suffering a costly breach. This is why i46 is proud to introduce SHIELD, a groundbreaking initiative under the CyberSecDome Open Call, designed to deliver clean, high-quality, and GDPR-compliant datasets derived from real-world IoT telemetry.
Unlike synthetic datasets, which often lack the depth and unpredictability of real-world scenarios, SHIELD leverages 350 million records from over 500 active IoT devices, including industrial sensors, smart city infrastructure, and cloud servers. These datasets capture minute-by-minute updates on critical metrics such as CPU load, memory usage, firewall configurations, and network activity—providing a rich foundation for AI-driven threat detection.
One of the key challenges in working with raw IoT data is its inherent noise. Devices report inconsistent metrics, redundant status updates, and occasional gaps in telemetry. SHIELD addresses this through a rigorous data-cleaning pipeline that prioritizes relevance, removes redundant entries, and ensures compliance with ethical guidelines. The result is a refined dataset that maintains its diversity while being optimized for AI training.
Beyond technical excellence, SHIELD emphasizes collaboration and transparency. By working closely with CyberSecDome’s consortium, we ensure that the datasets align with real-world cybersecurity needs—whether for penetration testing, vulnerability assessment, or proactive defense strategies.
As we progress through this five-week project, we invite the cybersecurity community to follow our journey.