ECUPrint Aligned Dataset
Filtered & Aligned CAN Bus Voltage Datasets based on the ECUPrint Dataset
The dataset linked on this webpage represents the aligned and filtered input used by the Promoter-Censor algorithm proposed in our paper: Constraint-Guided Clustering for Identifying In-Vehicle Electronic Control Units from Voltage Data. [pdf]
Motivation:
The original ECUPrint dataset was created by our group three years ago for statistical analysis of ECU voltage samples, where slight misalignments or incomplete entries had only a limited impact.
However, for machine-learning classifier benchmarking, we later observed that these inconsistencies led to distorted performance estimates. To address this issue, we sanitized the dataset by aligning and filtering the samples to ensure that the reported results reflect classifier performance rather than data irregularities.
Briefly, the modifications compared to the ECUPrint dataset are the following:
- all bits from the 10 vehicles are aligned (the Python script used for aligned is also available)
- samples are cut to exactly 1600 time-steps for passenger cars and 2500 time-steps for the heavy-duty vehicle
- acknowledgement bits are removed because they do not come from the ECU that is the sender of the ID
- incomplete bits lacking the falling edge were discarded to ensure dataset consistency
Result: The Ground Truth resulting from the new metholdology is slightly different from the original ECUPrint paper and is available in this [pdf] .
Independent corroboration: We also verified the number of ECUs in the Ford vehicles with a diagnostic tool (FORScan v2.3.65) together with the electrical wiring diagrams and it matches the number of ECUs that we identified using Constraint-Guided Clustering. Documents used for determination of electrical wiring diagrams are:
- Module Communication Diagram from Cardiagn - Ford Fiesta
- Module Communication Diagram from Cardiagn - Ford Ecosport
- Module Communication Diagram from Cardiagn - Ford Kuga
Download links: The resulting dataset is available at this link. ECUPrint_Aligned.zip
More details related to the bit aligning concept, applied filters and insights related to the dataset structure and file contents are described below.
1. Data pre-processing
ECUPrint raw voltage data was collected from 10 vehicles, ranging from small cars to SUVs and a heavy-duty vehicle with a Pico Scope 5000 Series.
1.1. Sample Alignment and Trimming
For each frame carrying a specific ID the ECUPrint dataset contains isolated dominant bits, i.e., a transition from recessive to dominant state and back. In the original files from the ECUPrint dataset, the rising edges and falling edges from each dominant bit are not aligned at the same index, as shown in the images below for the samples corresponding to ID 4F1 from the Hyundai i20 (left image) and to ID 04EF0021 from the John Deere tractor (right image).
We aligned the bits for each ID at the same index, which led to a different number of time-steps per file out of which we preserve only 1600 time-steps for passenger cars and 2500 time-steps for the heavy-duty vehicle. An examples of the newly aligned bits is shown in the images below. They correspond to ID 4F1 from the Hyundai i20 (left image) and to ID 04EF0021 from the John Deere tractor (right image).
1.2 - Removal of ACK Bits - Analyzing the bits from the ECUPrint dataset, we have found some acknowledgement bits instead of genuine dominant bits that were removed from the alignment process and are not part of the ECUPrint Aligned Dataset. An example is shown for one of the Honda Civic files that had different samples for ID 19B compared to all other files that had the right samples.
1.3 - Removal of Non-Isolated Bits - For some IDs, the ECUPrint dataset does not contain single isolated dominant bits. The voltage samples for those bits had a continuous plateau level while for isolated bits, the file ends with the samples of the falling edge. These IDs were removed from the alignment process and are not part of the ECUPrint Aligned Dataset. An example is shown for ID 511 from the Dacia Duster that had different samples compared to all other IDs from the same ECU.
1.4 - Establishment of a New ECU Allocation - Based on the newer analysis of the voltage samples for all of the IDs, we found that there is a different number of ECUs for some vehicles compared to the determination from ECUPrint. There are two additional ECUs determined for the Ford Kuga and 1 additional ECU determined for the Ford Fiesta and Ford Ecosport while there is 1 ECU less for the Hyundai i20. This is also due to the use of some voltage bits that were left as "Unclassified" in the original ECUPrint dataset and were not grouped with a particular ECU since the clock skew could not be determined for those IDs based on the collected frames. The updated ECU allocation from the ECUPrint Aligned dataset provides the newly determined ground truth allocation of IDs to ECUs to the best of our knowledge.
Number                   |
Vehicle                             |
Model year                   |
No. of IDs                   |
No. of identified ECUs                   |
Voltage bits                   |
| (i) | Honda Civic | 2012-2017 | 43 | 6 | 14,567 |
| (ii) | Opel Corsa | 2006-2014 | 28 | 4 | 9,131 |
| (iii) | Hyundai i20 | 2014-2020 | 40 | 6 | 17,767 |
| (iv) | John Deere Tractor | 2010-2018 | 39 | 3 | 4,021 |
| (v) | Dacia Duster | 2010-2017 | 11 | 3 | 8,942 |
| (vi) | Dacia Logan | 2012-2019 | 45 | 6 | 31,297 |
| (vii) | Hyundai ix35 | 2009-2015 | 26 | 6 | 19,856 |
| (viii) | Ford Fiesta | 2017-2020 | 47 | 7 | 21,729 |
| (ix) | Ford Kuga | 2013-2019 | 70 | 11 | 28,024 |
| (x) | Ford Ecosport | 2018-2021 | 85 | 5 | 20,044 |
| Total | - | - | 434 | 57 | 175,378 |
2. Dataset
Dataset content.
The dataset is structured as described below. We provide the raw CAN voltage samples measured with the PicoScope with a sample interval of 2 nanoseconds (sample rate was set to 500 MS/s).
CAN voltages are collected for 10 cars (175,378 sampled bits) with ECU allocation. Data is allocated to specific ECUs based on the analysis in our work. Note that this distribution is to the best we could ascertain based on our analysis, we do not claim this separation to be absolute.
Folder structure
CAN voltage samples with ECU allocation
|
|------ DUSTER
|        |------ ECU1
|        |------ ECU2
|        |------ ECU3
|
|------ LOGAN
|        |------ ECU1
|        |------ ECU2
|        |------ ECU3
|        |------ ECU4
|        |------ ECU5
|        |------ ECU6
|
|------ ECOSPORT
|        |------ ECU1
|        |------ ECU2
|        |------ ECU3
|        |------ ECU4
|        |------ ECU5
|
|------ FIESTA
|        |------ ECU1
|        |------ ECU2
|        |------ ECU3
|        |------ ECU4
|        |------ ECU5
|        |------ ECU6
|        |------ ECU7
|
|------ KUGA
|        |------ ECU1
|        |------ ECU2
|        |------ ECU3
|        |------ ECU4
|        |------ ECU5
|        |------ ECU6
|        |------ ECU7
|        |------ ECU8
|        |------ ECU9
|        |------ ECU10
|        |------ ECU11
|
|------ CIVIC
|        |------ ECU1
|        |------ ECU2
|        |------ ECU3
|        |------ ECU4
|        |------ ECU5
|        |------ ECU6
|
|------ I20
|        |------ ECU1
|        |------ ECU2
|        |------ ECU3
|        |------ ECU4
|        |------ ECU5
|        |------ ECU6
|
|------ IX35
|        |------ ECU1
|        |------ ECU2
|        |------ ECU3
|        |------ ECU4
|        |------ ECU5
|        |------ ECU6
|
|------ JOHNDEERE
|        |------ ECU1
|        |------ ECU2
|        |------ ECU3
|
|------ CORSA
|        |------ ECU1
|        |------ ECU2
|        |------ ECU3
|        |------ ECU4
File structure
Voltage data is stored in csv format and has some metadata included before the raw voltage samples. The metadata contains the following information in the first rows from each file:
[ID (hexadecimal)],
[ID (decimal)],
[DLC (decimal)],
[Timestamp, Channel 1 (CANH), Channel 2 (CANL)],
[Measurement units],
The metadata is followed by the actual raw voltage samples:
[Voltage data (1600 timesteps/file for cars and 2500 timesteps/file for the John Deere tractor)].
3. Environmental conditions
Environment conditions were not within the scope of our work, but for those who are interested, there is also a secondary archive containing the aligned voltage data afected by environmental conditions (after 10, 30 and 60 minutes after driving) ECUPrint_Environmental_Aligned.zipFor the Honda Civic and Ford Fiesta additional datasets are available. The first datasets were collected after vehicle startup (cold engine) and other sets after 10 minutes, 30 minutes and 1 hour drive (with a warm engine). CAN voltages under environmental variations consist in 35,893 sampled bits that correspond to 2 of the cars that we analyze: Ford Fiesta and Honda Civic.
Alignment of the Environmental Dataset - The environmental dataset that contains raw voltage data collected from Ford Fiesta (static) and Honda Civic (dynamic) was updated. The updated ECU allocation is also shown in this dataset.
Number                   |
Vehicle                             |
Model year                   |
No. of IDs                   |
No. of identified ECUs                   |
Voltage bits                   |
| (i) | Honda Civic | 2012-2017 | 43 | 6 | 13,926 |
| (ii) | Ford Fiesta | 2017-2020 | 47 | 7 | 21,967 |
| Total | - | - | 90 | 13 | 35,893 |
|
|------ FIESTA
|        |------ FIESTA_30min_static
|                  |------ ECU1
|                  |------ ECU2
|                  |------ ECU3
|                  |------ ECU4
|                  |------ ECU5
|                  |------ ECU6
|                  |------ ECU7
|        |------ FIESTA_60min_static
|                  |------ ECU1
|                  |------ ECU2
|                  |------ ECU3
|                  |------ ECU4
|                  |------ ECU5
|                  |------ ECU6
|                  |------ ECU7
|
|------ CIVIC
|        |------ CIVIC_10min_dynamic
|                  |------ ECU1
|                  |------ ECU2
|                  |------ ECU3
|                  |------ ECU4
|                  |------ ECU5
|                  |------ ECU6
|        |------ CIVIC_30min_dynamic
|                  |------ ECU1
|                  |------ ECU2
|                  |------ ECU3
|                  |------ ECU4
|                  |------ ECU5
|                  |------ ECU6
|        |------ CIVIC_60min_dynamic
|                  |------ ECU1
|                  |------ ECU2
|                  |------ ECU3
|                  |------ ECU4
|                  |------ ECU5
|                  |------ ECU6
4. Publication
If you are using the aligned version of the ECUPrint dataset, please give credit to the paper below:B. Groza, P. Iosif and L. Popa, "Constraint-Guided Clustering for Identifying in-Vehicle Electronic Control Units from Voltage Data", 2026. [pdf]
@article{groza26constraint,
title={Constraint-Guided Clustering for Identifying in-Vehicle Electronic Control Units from Voltage Data},
author={Groza, Bogdan and Iosif, Patricia and Popa, Lucian},
conference={},
year={2026},
publisher={}
}
5. Contact
For any questions about our work and dataset, don't hesitate to contact us:
lucian.popa [at] aut.upt.ro
bogdan.groza [at] upt.ro