Report chapter 3: The dataset¶

Here, you can find an overview of how to contribute to chapter 3 of our report, copy & pasted from the report, so you do not need to create a pdf from the .tex files.

3 The dataset¶

Different preprocessing steps of the data and deciding how the data should/will look like. The label app is introduced and the process of manually labelling a part of the data, with a description of the criteria. Finally, the dataset is introduced.

Preprocessing steps

First approach to create a dataset (layout) and data augmentation to generate more samples.

Automatic feature extraction Sophia? We (tried to) create scripts for: background removal of the collected images, and automatic feature extraction pipeline (including the decision that we first try to sort for features and not labels) for rust, bent, etc. …

Preparation for manual feature extraction Maren? Preparing the images for manual classification to create more labelled data: sorting the pictures in the grid, have 3 pictures per asparagus spear, etc. …

The hand-label app Michael?

Introduction to the script created for manual sorting. Fusion of the feature extraction scripts: What is it? Why did we need it? What was the idea behind it? How does it work? (keep short! it’s only the introduction) Do not explain in length here but rather give an idea and refer to README’s and to code in GitHub whenever possible.

How to install Installation of the app: environment setup, mount points, problems we ran into, etc. …

Operating instructions User manual for the app and introduction to its graphical user interface: What can you find where? (include one example picture of GUI), Step-by-step guideline through loading pictures, creating a .csv file, and how to sort one picture.

Performance Results and general performance of the app: How well did the feature extraction work? How much features had to be labelled by hand? What is the output of the app?

Manual labeling

Sorting criteria Josefine The criteria explained in detail for the hand-labelling of the features with the app (including example pictures). What are expected difficulties we might encounter?

Sorting outcome Josefine The process and the results of the sorting: How much did we sort? How well did the sorting work in general (i.e., was it easy to sort? how long did it take? what problems were encountered?)? How accurately did we sort as a group? (i.e., Kappa Agreement)

Validity Malin

Expanding on how accurately we sorted/how valid our sorting was as a group i.e. on the Kappa Agreement

The asparagus dataset Richard?

Different datasets Sophia? Structural information on the datasets: What do they look like? How big are they (labelled vs unlabelled samples)? Which were criteria for throwing out data? (maybe have an overview picture with all relevant information on one glance

Challenges Sophia, Richard Problems and challenges during the creation of the datasets: What were the challenges in creating a general dataset? What were challenges in general? How well could we work with the datasets? What was used as training data, validation data, and test data?