ECUPrint Aligned Dataset

Filtered & Aligned CAN Bus Voltage Datasets based on the ECUPrint Dataset


The dataset linked on this webpage represents the aligned and filtered input used by the Promoter-Censor algorithm proposed in our paper: Constraint-Guided Clustering for Identifying In-Vehicle Electronic Control Units from Voltage Data. [pdf]



all_ecus_2d

Motivation: The original ECUPrint dataset was created by our group three years ago for statistical analysis of ECU voltage samples, where slight misalignments or incomplete entries had only a limited impact. However, for machine-learning classifier benchmarking, we later observed that these inconsistencies led to distorted performance estimates. To address this issue, we sanitized the dataset by aligning and filtering the samples to ensure that the reported results reflect classifier performance rather than data irregularities.

Briefly, the modifications compared to the ECUPrint dataset are the following:

  • all bits from the 10 vehicles are aligned (the Python script used for aligned is also available)
  • samples are cut to exactly 1600 time-steps for passenger cars and 2500 time-steps for the heavy-duty vehicle
  • acknowledgement bits are removed because they do not come from the ECU that is the sender of the ID
  • incomplete bits lacking the falling edge were discarded to ensure dataset consistency

As a consequence the following IDs were removed the ECUPrint Aligned Dataset: 0x370 (Corsa), 0x511 (Duster), 0x4DE (Logan), 0x3A9, 0x43C, 0x171 (Ecosport), 0x428 (ix35) and 1 bit was removed for IDs 0x294, 0x19B (Civic). The sanitized dataset retains 175,378 samples from the original 181,874 samples of the ECUPrint dataset.

Result: The Ground Truth resulting from the new metholdology is slightly different from the original ECUPrint paper and is available in this [pdf] .

Independent corroboration: We also verified the number of ECUs in the Ford vehicles with a diagnostic tool (FORScan v2.3.65) together with the electrical wiring diagrams and it matches the number of ECUs that we identified using Constraint-Guided Clustering. Documents used for determination of electrical wiring diagrams are:


Download links: The resulting dataset is available at this link. ECUPrint_Aligned.zip

More details related to the bit aligning concept, applied filters and insights related to the dataset structure and file contents are described below.


1. Data pre-processing

ECUPrint raw voltage data was collected from 10 vehicles, ranging from small cars to SUVs and a heavy-duty vehicle with a Pico Scope 5000 Series.

1.1. Sample Alignment and Trimming

For each frame carrying a specific ID the ECUPrint dataset contains isolated dominant bits, i.e., a transition from recessive to dominant state and back. In the original files from the ECUPrint dataset, the rising edges and falling edges from each dominant bit are not aligned at the same index, as shown in the images below for the samples corresponding to ID 4F1 from the Hyundai i20 (left image) and to ID 04EF0021 from the John Deere tractor (right image).

i20_original JD_original

We aligned the bits for each ID at the same index, which led to a different number of time-steps per file out of which we preserve only 1600 time-steps for passenger cars and 2500 time-steps for the heavy-duty vehicle. An examples of the newly aligned bits is shown in the images below. They correspond to ID 4F1 from the Hyundai i20 (left image) and to ID 04EF0021 from the John Deere tractor (right image).

i20_aligned JD_aligned

1.2 - Removal of ACK Bits - Analyzing the bits from the ECUPrint dataset, we have found some acknowledgement bits instead of genuine dominant bits that were removed from the alignment process and are not part of the ECUPrint Aligned Dataset. An example is shown for one of the Honda Civic files that had different samples for ID 19B compared to all other files that had the right samples.

civic_ack

1.3 - Removal of Non-Isolated Bits - For some IDs, the ECUPrint dataset does not contain single isolated dominant bits. The voltage samples for those bits had a continuous plateau level while for isolated bits, the file ends with the samples of the falling edge. These IDs were removed from the alignment process and are not part of the ECUPrint Aligned Dataset. An example is shown for ID 511 from the Dacia Duster that had different samples compared to all other IDs from the same ECU.

duster_nonisolated

1.4 - Establishment of a New ECU Allocation - Based on the newer analysis of the voltage samples for all of the IDs, we found that there is a different number of ECUs for some vehicles compared to the determination from ECUPrint. There are two additional ECUs determined for the Ford Kuga and 1 additional ECU determined for the Ford Fiesta and Ford Ecosport while there is 1 ECU less for the Hyundai i20. This is also due to the use of some voltage bits that were left as "Unclassified" in the original ECUPrint dataset and were not grouped with a particular ECU since the clock skew could not be determined for those IDs based on the collected frames. The updated ECU allocation from the ECUPrint Aligned dataset provides the newly determined ground truth allocation of IDs to ECUs to the best of our knowledge.

Number                  

Vehicle                            

Model year                  

No. of IDs                  

No. of identified ECUs                  

Voltage bits                  

(i) Honda Civic 2012-2017 43 6 14,567
(ii) Opel Corsa 2006-2014 28 4 9,131
(iii) Hyundai i20 2014-2020 40 6 17,767
(iv) John Deere Tractor 2010-2018 39 3 4,021
(v) Dacia Duster 2010-2017 11 3 8,942
(vi) Dacia Logan 2012-2019 45 6 31,297
(vii) Hyundai ix35 2009-2015 26 6 19,856
(viii) Ford Fiesta 2017-2020 47 7 21,729
(ix) Ford Kuga 2013-2019 70 11 28,024
(x) Ford Ecosport 2018-2021 85 5 20,044
Total - - 434 57 175,378

2. Dataset

Dataset content. The dataset is structured as described below. We provide the raw CAN voltage samples measured with the PicoScope with a sample interval of 2 nanoseconds (sample rate was set to 500 MS/s). CAN voltages are collected for 10 cars (175,378 sampled bits) with ECU allocation. Data is allocated to specific ECUs based on the analysis in our work. Note that this distribution is to the best we could ascertain based on our analysis, we do not claim this separation to be absolute.


Folder structure

CAN voltage samples with ECU allocation
|
|------ DUSTER
|        |------ ECU1
|        |------ ECU2
|        |------ ECU3
|
|------ LOGAN
|        |------ ECU1
|        |------ ECU2
|        |------ ECU3
|        |------ ECU4
|        |------ ECU5
|        |------ ECU6
|
|------ ECOSPORT
|        |------ ECU1
|        |------ ECU2
|        |------ ECU3
|        |------ ECU4
|        |------ ECU5
|
|------ FIESTA
|        |------ ECU1
|        |------ ECU2
|        |------ ECU3
|        |------ ECU4
|        |------ ECU5
|        |------ ECU6
|        |------ ECU7
|
|------ KUGA
|        |------ ECU1
|        |------ ECU2
|        |------ ECU3
|        |------ ECU4
|        |------ ECU5
|        |------ ECU6
|        |------ ECU7
|        |------ ECU8
|        |------ ECU9
|        |------ ECU10
|        |------ ECU11
|
|------ CIVIC
|        |------ ECU1
|        |------ ECU2
|        |------ ECU3
|        |------ ECU4
|        |------ ECU5
|        |------ ECU6
|
|------ I20
|        |------ ECU1
|        |------ ECU2
|        |------ ECU3
|        |------ ECU4
|        |------ ECU5
|        |------ ECU6
|
|------ IX35
|        |------ ECU1
|        |------ ECU2
|        |------ ECU3
|        |------ ECU4
|        |------ ECU5
|        |------ ECU6
|
|------ JOHNDEERE
|        |------ ECU1
|        |------ ECU2
|        |------ ECU3
|
|------ CORSA
|        |------ ECU1
|        |------ ECU2
|        |------ ECU3
|        |------ ECU4


File structure

Voltage data is stored in csv format and has some metadata included before the raw voltage samples. The metadata contains the following information in the first rows from each file:

[ID (hexadecimal)],
[ID (decimal)],
[DLC (decimal)],
[Timestamp, Channel 1 (CANH), Channel 2 (CANL)],
[Measurement units],

The metadata is followed by the actual raw voltage samples:

[Voltage data (1600 timesteps/file for cars and 2500 timesteps/file for the John Deere tractor)].


3. Environmental conditions

Environment conditions were not within the scope of our work, but for those who are interested, there is also a secondary archive containing the aligned voltage data afected by environmental conditions (after 10, 30 and 60 minutes after driving) ECUPrint_Environmental_Aligned.zip



For the Honda Civic and Ford Fiesta additional datasets are available. The first datasets were collected after vehicle startup (cold engine) and other sets after 10 minutes, 30 minutes and 1 hour drive (with a warm engine). CAN voltages under environmental variations consist in 35,893 sampled bits that correspond to 2 of the cars that we analyze: Ford Fiesta and Honda Civic.

Alignment of the Environmental Dataset - The environmental dataset that contains raw voltage data collected from Ford Fiesta (static) and Honda Civic (dynamic) was updated. The updated ECU allocation is also shown in this dataset.

Number                  

Vehicle                            

Model year                  

No. of IDs                  

No. of identified ECUs                  

Voltage bits                  

(i) Honda Civic 2012-2017 43 6 13,926
(ii) Ford Fiesta 2017-2020 47 7 21,967
Total - - 90 13 35,893
CAN voltage samples under environmental variations
|
|------ FIESTA
|        |------ FIESTA_30min_static
|                  |------ ECU1
|                  |------ ECU2
|                  |------ ECU3
|                  |------ ECU4
|                  |------ ECU5
|                  |------ ECU6
|                  |------ ECU7
|        |------ FIESTA_60min_static
|                  |------ ECU1
|                  |------ ECU2
|                  |------ ECU3
|                  |------ ECU4
|                  |------ ECU5
|                  |------ ECU6
|                  |------ ECU7
|
|------ CIVIC
|        |------ CIVIC_10min_dynamic
|                  |------ ECU1
|                  |------ ECU2
|                  |------ ECU3
|                  |------ ECU4
|                  |------ ECU5
|                  |------ ECU6
|        |------ CIVIC_30min_dynamic
|                  |------ ECU1
|                  |------ ECU2
|                  |------ ECU3
|                  |------ ECU4
|                  |------ ECU5
|                  |------ ECU6
|        |------ CIVIC_60min_dynamic
|                  |------ ECU1
|                  |------ ECU2
|                  |------ ECU3
|                  |------ ECU4
|                  |------ ECU5
|                  |------ ECU6

4. Publication

If you are using the aligned version of the ECUPrint dataset, please give credit to the paper below:

B. Groza, P. Iosif and L. Popa, "Constraint-Guided Clustering for Identifying in-Vehicle Electronic Control Units from Voltage Data", 2026. [pdf]

@article{groza26constraint,
title={Constraint-Guided Clustering for Identifying in-Vehicle Electronic Control Units from Voltage Data},
author={Groza, Bogdan and Iosif, Patricia and Popa, Lucian},
conference={},
year={2026},
publisher={}
}

5. Contact

For any questions about our work and dataset, don't hesitate to contact us:
lucian.popa [at] aut.upt.ro
bogdan.groza [at] upt.ro