Creating Training Data
Learn how to define training areas, label vegetation types, and prepare datasets for model training.
Overview
The Training Data tool is designed to generate high-quality, class-labelled spatial data used to train AI models for vegetation classification. Users define training areas, apply automated segmentation tools (like SAM Wand), and assign vegetation types to build datasets for model training or accuracy testing.
Workflow Summary
There are three main steps to creating training data:
- Create a Training Area.
- Create Training Data (label polygons by vegetation type).
- Create Training Dataset (bundle and export for model use).
Step 1: Create a Training Area
You must define a Training Area before creating training data.
To create a training area:
- Click the 'Training Area Tool' button in the Training Data toolbar panel.
- Left-click and drag on the map to draw a rectangular training area to your desired size.
- The new training area will appear in the sidebar under Training Areas.
Step 2: Create Training Data
Once a training area is selected, you can start generating labelled polygons within it.
Tools for generating training data:
- SAM Wand Tool: Uses the Segment Anything Model (SAM) to rapidly create segmented features. See SAM Wand Tool
- Polygon Tool: Allows manual drawing of features (useful for precise or corrective mapping). See TytonAI Toolbar
- Layer Correction Tool: Used to refine or correct AI-generated segments using raster overlays. See Layer Correction Tool
To use the SAM Wand:
- Select a class from the class list (e.g. Tree, Shrub, Tussock, etc.).
- Click the SAM Wand icon in the toolbar.
- Click inside the training area to generate mask candidates.
- Place positive points (click) to indicate desired areas.
- Place negative points (right-click) to remove undesired parts.
- Press Enter to confirm and apply the mask as a new polygon.
Class Requirements
- Classes are listed in the right panel and are based on your created class list.
- You need at least 5m2 of each classification in your training area to create a training dataset.
- Area (m²) and feature count are shown for each class.
Step 3: (Optional) Create Training Dataset
If you’re creating training data in an area that is relevant to other projects or imagery with similar conditions—such as landforms, vegetation types, or ecological zones—you can choose to publish that data as a reusable training dataset. This enables you to apply consistent training data across multiple projects to improve model training.
To create a training dataset:
- Enter a Training Dataset Name.
- Click Select Training Areas to select training areas to be part of your dataset.
- Click Publish Training Dataset.
The dataset will now be available for reuse across your projects.
For optimal classification results:
- Create training data in areas representative of your entire study region
- Include examples of all vegetation classes in your training data
- Add training data in transition zones between vegetation types
- Sample across different environmental conditions (slopes, aspects, soil types)
- Focus extra effort on vegetation types that are rare or particularly important
To make the most of the SAM wand tool:
- Generate SAM masks for small areas at a time for faster processing
- Be consistent with point placement relative to vegetation boundaries
- Use fewer, strategic points rather than many scattered points
- Start with positive points to define the core area, then refine with negative points
- For trees and distinct objects, place points near the centre for best results