Change how data is validated¶
Data validation is done by the validation_function
, as specified in the config. This docs page covers how to
write your own validation_function
.
How is a validation_function
called?¶
A validation function is called with the single argument kwargs
:
def validation_function(**kwargs)
...
kwargs
is a Python dictionary that contains all arguments from the config file. For example if the config
file contains the line berlin_format_link_data: path/to/link_data.csv
the kwargs
dictionary will contain the
key-value pair "input_link_data": "path/to/link_data.csv"
.
kwargs
lets you access the input files that are specified in the config file.
You can access these input files and do whatever you want with them. For example you can check if
an input file contains all necessary columns.
Example¶
import pandas as pd
import logging
def validation_function(**kwargs)
# validate the link data
# 1. Load link data from file
link_data_file = kwargs["input_link_data"]
link_data = pd.read_csv(link_data)
# 2. Check that link data has the column 'LinkID', 'Length_m', 'MaxSpeed_kmh', 'AreaCat', and 'RoadCat'.
for colname in ["LinkID", "Length_m", "MaxSpeed", "AreaCat", "RoadCat"]:
if colname not in link_data.columns:
logging.warning(f"link data is missing the column {colname}")
# 3. perform other validation operations on the link data
...
# validate other datasets
...
Output of the validation_function¶
What you want the validation_function
to output is up to you.
The validation functions that we provide print warnings whenever a validation check fails. This means
that YETI will keep running even if the validation fails.
You can print warnings with logging.warning("..")
, as shown in the example above.
If you want to stop the YETI run when a validation check fails, you should raise an Error. For example:
def validation_function(**kwargs):
if some_validation_check_does_not_pass():
raise RuntimeError("validation failed")