Paper accepted at PerCom 2022!
Our paper on the generation of realistic synthetic IoT datasets has been accepted at PerCom 2022. The paper has also been nominated for best paper award! More information at the PerCom 2022 website.
SmartSPEC: Customizable Smart Space Datasets via Event-Driven Simulations
Andrew Chio (University of California, Irvine, USA); Daokun Jiang (Google); Peeyush Gupta (University of California, Irvine, USA); Georgios Bouloukakis (Telecom SudParis, USA); Roberto Yus (University of Maryland, Baltimore County, USA); Sharad Mehrotra and Nalini Venkatasubramanian (University of California, Irvine, USA)
This paper presents SmartSPEC, an approach to generate customizable smart space datasets using sensorized spaces in which people and events are embedded. Smart space datasets are critical to design, deploy and evaluate robust systems and applications to ensure cost-effective operation and safety/comfort/convenience of the space occupants. Often, real-world data is difficult to obtain due to the lack of fine-grained sensing; privacy/security concerns prevent the release and sharing of individual and spatial data. SmartSPEC is a smart space simulator and data generator that can create a digital representation (twin) of a smart space and its activities. SmartSPEC uses a semantic model and ML-based approaches to characterize and learn attributes in a sensorized space, and applies an event-driven simulation strategy to generate realistic simulated data about the space (events, trajectories, sensor datasets, etc). To evaluate the realism of the data generated by SmartSPEC, we develop a structured methodology and metrics to assess various aspects of smart space datasets, including trajectories of people and occupancy of spaces. Our experimental study looks at two real-world settings/datasets: an instrumented smart campus building and a city-wide GPS dataset. Our results show that the trajectories produced by SmartSPEC are 1.4x to 4.4x more realistic than the best synthetic data baseline when compared to real-world data, depending on the scenario and configuration.