API Reference¶
This is the complete list of all the functionalities that the data minimization api offers. All methods expect a list of dictionaries as input.
-
data_minimization_tools.drop_keys(data: [<class 'dict'>], keys)¶ Removes the data for specific keys (does not drop the key form the dictionary!
Parameters: - data – input data as list of dicts
- keys – list of keys whose values should be removed
Returns: cleaned list of dicts
-
data_minimization_tools.hash_keys(data: [<class 'dict'>], keys, hash_algorithm=<built-in function openssl_sha256>, salt=None, digest_to_bytes=False)¶ Hashes data for specific keys.
Parameters: - data – input data as list of dicts
- keys – list of keys whose values should be hashed
- hash_algorithm – the hashalgorith to apply. Can be any hashlib algorith or any function that behaves similarly
- salt – the salt to use
- digest_to_bytes – whether result should be bytes. If False, result is of type string
Returns: cleaned list of dicts
-
data_minimization_tools.reduce_to_mean(data: [<class 'dict'>], keys)¶ Reduce all values for the given key to the mean across all values of the input data list
Parameters: - data – input data as list of dicts
- keys – list of keys whose values should be replaced
Returns: cleaned list of dicts. Note, that this function returns as many items as you input.
-
data_minimization_tools.reduce_to_median(data: [<class 'dict'>], keys)¶ Reduce all values for the given key to the median across all values of the input data list
Parameters: - data – input data as list of dicts
- keys – list of keys whose values should be replaced
Returns: cleaned list of dicts. Note, that this function returns as many items as you input.
-
data_minimization_tools.reduce_to_nearest_value(data: [<class 'dict'>], keys, step_width=10)¶ Reduce all values for the given key to the nearest value. Think of this as aggregating values as intervals.
Parameters: - data – input data as list of dicts
- keys – list of keys whose values should be replaced
- step_width – size of the intervals
Returns: cleaned list of dicts. Note, that this function returns as many items as you input.
-
data_minimization_tools.replace_with(data: [<class 'dict'>], replacements: dict)¶ Receives a 1:1 mapping of original value to new value and replaces the original values accordingly. This corresponds to CN-Protect’s DataHierarchy.
Parameters: - data – input data as list of dicts
- replacements – 1:1 mapping
Returns: cleaned list of dicts
-
data_minimization_tools.replace_with_distribution(data: [<class 'dict'>], keys, numpy_distribution_function_str='standard_normal', *distribution_args, **distribution_kwargs)¶ Replaces data for specific keys with data generated from a distribution.
Parameters: - data – input data as list of dicts
- keys – list of keys whose values should be replaced
- numpy_distribution_function_str – for possible distribution functions see here. Pass the function as string
- distribution_args – additional args that the chosen function requires
- distribution_kwargs – additional kwargs that the chosen function requires
Returns: cleaned list of dicts
-
data_minimization_tools.cvdi.anonymize_journey(data: [<class 'dict'>], original_to_cvdi_key: dict, config_overrides: dict = None) → [<class 'dict'>]¶ Anonymize a journey using the U.S. DoT’s Privacy Protection Application.
Some of the waypoints in the input will not be present in the output, the rest will have only their geodata (see
REQUIRED_KEYS) altered. Any additional attributes of the points that were not dropped from the output will remain unchanged.Because the de-identification algorithm relies on knowledge of the roads along a journey, a so-called quad file must be provided. Generate such a file named “quad” and place it in
./cvdi-conf/(relative to the script’s working directory).Parameters: - data – input data as list of dicts.
- original_to_cvdi_key – Mapping of the input data’s fields to the fields required by the de-identification
algorithm, e.g.,
{"lat": "Latitude", ...}, wherelatis part of the input data. For the list of required fields, seeREQUIRED_KEYS. - config_overrides – Overrides to the de-identification application’s settings. For example, to increase the
length of privacy intervals to 300m, provide
{"max_direct_distance": 300, "max_manhattan_distance: 300}.
Returns: A new, shorter, list of dictionaries representing the waypoints of the de-identified journey.
-
data_minimization_tools.cvdi.REQUIRED_KEYS= {'Gentime', 'Heading', 'Latitude', 'Longitude', 'Speed'}¶ The keys required to be present in the input data for de-identification to work.