environmental_insights.data

environmental_insights.data#

read_nc#

def read_nc(filepath: str) -> xr.Dataset

Read a NetCDF (.nc) file into an xarray.Dataset.

Parameters#

filepath : str Path to the NetCDF file.

Returns#

xr.Dataset

netcdf_to_dataframe#

def netcdf_to_dataframe(ds: xr.Dataset) -> pd.DataFrame

Convert an xarray.Dataset to a pandas DataFrame, dropping any rows that have no valid data in any variable.

Parameters#

ds : xr.Dataset The dataset to convert.

Returns#

pd.DataFrame

get_uk_monitoring_station#

def get_uk_monitoring_station(pollutant: str,
                              station: str) -> gpd.GeoDataFrame

Download (if needed) and load ML-HAPPE training data for a single UK monitoring station.

Parameters#

pollutant : str One of the POLLUTANTS in ML-HAPPE (e.g. “no2”). station : str Name of the station (without “.nc”). token : str, optional Your CEDA API token, if required.

Returns#

GeoDataFrame The station’s training data as a GeoDataFrame, with a Point geometry constructed from its latitude/longitude.

get_uk_monitoring_stations#

def get_uk_monitoring_stations(pollutant: str) -> List[str]

Ensure the local ML-HAPPE Training_Data folder exists for a pollutant, then return the list of all station names by delegating to download.py.

air_pollution_concentration_typical_day_real_time_united_kingdom#

def air_pollution_concentration_typical_day_real_time_united_kingdom(
        month: int, day_of_week: str, hour: int, data_type: str = "Input")

Retrieve the typical-day dataset (either Input or Output) for the UK for a given time.

Arguments:

month : int Month of interest, 1 (January) through 12 (December). day_of_week : str Day of week of interest, e.g., “Monday”, “Tuesday”, etc. hour : int Hour of interest, 0 (midnight) through 23. data_type : str, optional Whether to fetch the “Input” or the “Output” version of the SynthHAPPE dataset. Defaults to “Input”.

Returns:

pandas.DataFrame The typical-day air pollution dataset for the UK at the specified time, as a flattened DataFrame.

air_pollution_concentration_nearest_point_typical_day_united_kingdom#

def air_pollution_concentration_nearest_point_typical_day_united_kingdom(
        month: int,
        day_of_week: str,
        hour: int,
        latitude: float,
        longitude: float,
        uk_grids,
        data_type: str = "Input")

Retrieve a single air pollution concentration data point (Input or Output) for the UK model at the closest grid point to a given lat/long.

Arguments:

month : int Month of interest, 1 (January) through 12 (December). day_of_week : str Day of week of interest, e.g. “Monday”, “Tuesday”, etc. hour : int Hour of interest, 0 (midnight) through 23. latitude : float Latitude of the point to estimate. longitude : float Longitude of the point to estimate. uk_grids : GeoDataFrame Model grid points for the UK. data_type : str, optional “Input” or “Output” dataset to fetch. Defaults to “Input”.

Returns:

pandas.DataFrame One-row DataFrame with the nearest grid’s pollutant values at the specified time.

air_pollution_concentration_complete_set_real_time_united_kingdom#

def air_pollution_concentration_complete_set_real_time_united_kingdom(
        time: str, data_type: str = "Input")

Retrieve the complete predicted dataset (Input or Output) for a given timestamp in the UK ML-HAPPE dataset.

Arguments:

time : str Timestamp of the form “YYYY-MM-DD HHmmss”. data_type : str, optional Whether to fetch the “Input” or “Output” version of the ML-HAPPE dataset. Defaults to “Input”.

Returns:

pandas.DataFrame The full air pollution dataset for the UK at the specified timestamp, as a flattened DataFrame.

air_pollution_concentration_nearest_point_real_time_united_kingdom#

def air_pollution_concentration_nearest_point_real_time_united_kingdom(
        latitude: float,
        longitude: float,
        time: str,
        uk_grids,
        data_type: str = "Input")

Retrieve a single air pollution concentration data point (Input or Output) for the UK ML-HAPPE model at the closest grid point to a given lat/long and timestamp.

Arguments:

latitude : float Latitude of the point to estimate. longitude : float Longitude of the point to estimate. time : str Timestamp of the form “YYYY-MM-DD HHmmss”. uk_grids : GeoDataFrame Model grid points for the UK. data_type : str, optional “Input” or “Output” dataset to fetch. Defaults to “Input”.

Returns:

pandas.DataFrame One‐row DataFrame with the nearest grid’s pollutant values at the specified time.

air_pollution_concentration_complete_set_real_time_global#

def air_pollution_concentration_complete_set_real_time_global(
        time: str, data_type: str = "Input")

Retrieve the complete predicted dataset (Input or Output) for a given timestamp in the Global ML-HAPPG dataset. This mirrors the UK real-time helper but points at ML-HAPPG and uses Global file locations.

Parameters#

time : str Timestamp of the form “YYYY-MM-DD HHmmss” (e.g., “2022-01-03 120000”). Note: ensure the date exists in the ML-HAPPG archive (e.g., 2022). data_type : {“Input”,”Output”}, default “Input” Which dataset to fetch.

Returns#

pandas.DataFrame The full global grid at the specified timestamp as a flattened DataFrame.

air_pollution_concentration_nearest_point_real_time_global#

def air_pollution_concentration_nearest_point_real_time_global(
        latitude: float,
        longitude: float,
        time: str,
        global_grids,
        data_type: str = "Input")

Retrieve a single air pollution concentration (Input or Output) for the Global ML-HAPPG model at the closest grid to a given lat/long and timestamp.

Returns#

pandas.DataFrame One-row DataFrame with the nearest grid’s values.

get_global_monitoring_station#

def get_global_monitoring_station(pollutant: str,
                                  station: str) -> gpd.GeoDataFrame

Download (if needed) and load ML-HAPPG training data for a single global monitoring station.

get_global_monitoring_stations#

def get_global_monitoring_stations(pollutant: str) -> List[str]

Return the list of global station names for a pollutant (ML-HAPPG).

get_amenities_as_geodataframe#

def get_amenities_as_geodataframe(amenity_type, min_lat, min_lon, max_lat,
                                  max_lon)

Fetch amenities of a given type within a bounding box and return as a GeoDataFrame.

Arguments:

amenity_type string - Type of amenity, e.g., “hospital”.
min_lat float - Minimum latitude.
min_lon float - Minimum longitude.
max_lat float - Maximum latitude.
max_lon float - Maximum longitude.

Returns:

GeoDataFrame - A GeoDataFrame containing the amenities with their names and coordinates.

get_highways_as_geodataframe#

def get_highways_as_geodataframe(highway_type, min_lat, min_lon, max_lat,
                                 max_lon)

Fetch highways of a specified type within a bounding box from OSM and return as a GeoDataFrame.

Arguments:

highway_type string - Type of highway, e.g., “motorway”, “residential”.
min_lat float - Minimum latitude.
min_lon float - Minimum longitude.
max_lat float - Maximum latitude.
max_lon float - Maximum longitude.

Returns:

GeoDataFrame - A GeoDataFrame containing the highways with their names and coordinates.

ckd_nearest_LineString#

def ckd_nearest_LineString(gdf_A, gdf_B, gdf_B_cols)

Calculate the nearest points between two GeoDataFrames containing LineString geometries.

This function uses cKDTree to efficiently find the nearest points between two sets of LineStrings. For each point in gdf_A, the function finds the closest point in gdf_B and returns the distances along with selected columns from gdf_B.

Arguments:

gdf_A GeoDataFrame - A GeoDataFrame containing LineString geometries.
gdf_B GeoDataFrame - A GeoDataFrame containing LineString geometries which will be used to find the closest points to gdf_A.
gdf_B_cols list or tuple - A list or tuple containing column names from gdf_B which will be included in the resulting DataFrame.

Returns:

GeoDataFrame - A GeoDataFrame with each row containing a geometry from gdf_A, corresponding closest geometry details from gdf_B (as specified by gdf_B_cols), and the distance to the closest point in gdf_B.

Notes:

The resulting GeoDataFrame maintains the order of gdf_A and attaches the nearest details from gdf_B. This code was adapted from the code available here: https://gis.stackexchange.com/questions/222315/finding-nearest-point-in-other-geodataframe-using-geopandas

get_even_spaced_points#

def get_even_spaced_points(points, number_of_points)

Generate a list of evenly spaced points between two given points.

This function calculates the distance (or difference) between two input points and divides this distance into number_of_points equal segments. The resulting points, including the start and end points, are returned in a list.

Arguments:

points list or tuple of float - A list or tuple containing two points (start and end) between which the evenly spaced points are to be calculated.
number_of_points int - The total number of points to generate, including the start and end points.

Returns:

list - A list of evenly spaced points between the provided start and end points.

Example:

get_even_spaced_points([1, 10], 5) [1.0, 3.25, 5.5, 7.75, 10.0]

Notes:

The function assumes that the points list is sorted in ascending order.

calculate_new_metrics_distance_total#

def calculate_new_metrics_distance_total(current_infrastructure, highway_type,
                                         start_point, end_point,
                                         land_grids_centroids, land_grids)

Simulate the addition of a proposed highway to current infrastructure and calculate new metrics.

This function creates a new proposed highway segment based on given start and end points. The proposed highway is then added to the current infrastructure dataset. After adding the new highway, the function calculates distance metrics and total length of the specific highway type.

Arguments:

current_infrastructure GeoDataFrame - The current infrastructure dataset with existing highways.
highway_type str - Type of the highway for which metrics are calculated (e.g., “motorway”).
start_point tuple of float - Coordinates (x, y) for the starting point of the proposed highway.
end_point tuple of float - Coordinates (x, y) for the ending point of the proposed highway.
land_grids_centroids GeoDataFrame - GeoDataFrame of the grids for predictions to be made on, with the geometry being a set of points representing the centroid of such grids.
land_grids GeoDataFrame - GeoDataFrame of the grids for predictions to be made on, with the geometry being a set of polygons representing the grids themselves.

Returns:

tuple:

GeoDataFrame: Contains metrics such as road infrastructure distance and total road length for each grid.
GeoDataFrame: A merged dataset of current infrastructure and the proposed highway.

Notes:

The function assumes the use of EPSG:4326 and EPSG:3395 for coordinate reference systems.
It also assumes the existence of helper functions like get_even_spaced_points and a global variable land_grids_centroids.

replace_feature_vector_column#

def replace_feature_vector_column(feature_vector, new_feature_vector,
                                  feature_vector_name)

Replace the feature vector column name with the new feature vector column name, replacing the data within the dataframe with new environmental conditions.

Arguments:

feature_vector DataFrame - DataFrame of the original data.
new_feature_vector DataFrame - DataFrame containing the new feature vector that is to be used to replace the data in feature_vector.
feature_vector_name string - Name of the feature vector to be changed.

Returns:

DataFrame - A DataFrame of the original data that was added with the feature vector now replaced by the new data.

get_uk_grids#

def get_uk_grids()

Get the spatial grids that represent the locations at which air pollution estimations are made for the UK Model.

Returns:

GeoDataFrame - A GeoDataFrame of the polygons for each of the grids in the UK Model alongside their centroid and unique ID.

get_uk_grids_outline#

def get_uk_grids_outline()

Get the spatial grid outlines representing the boundaries of the UK Model’s 1km grids.

Returns:

GeoDataFrame - A GeoDataFrame of the grid outlines for each UK Model grid alongside their centroid and unique ID.

get_global_grids#

def get_global_grids()

Get the spatial grids that represent the locations at which air pollution estimations are made for the Global Model.

Returns:

GeoDataFrame - A GeoDataFrame of the polygons for each of the grids in the Global Model and unique ID.

environmental_insights.data

Contents

environmental_insights.data#

read_nc#

Parameters#

Returns#

netcdf_to_dataframe#

Parameters#

Returns#

get_uk_monitoring_station#

Parameters#

Returns#

get_uk_monitoring_stations#

air_pollution_concentration_typical_day_real_time_united_kingdom#

air_pollution_concentration_nearest_point_typical_day_united_kingdom#

air_pollution_concentration_complete_set_real_time_united_kingdom#

air_pollution_concentration_nearest_point_real_time_united_kingdom#

air_pollution_concentration_complete_set_real_time_global#

Parameters#

Returns#

air_pollution_concentration_nearest_point_real_time_global#

Returns#

get_global_monitoring_station#

get_global_monitoring_stations#

get_amenities_as_geodataframe#

get_highways_as_geodataframe#

ckd_nearest_LineString#

get_even_spaced_points#

calculate_new_metrics_distance_total#

replace_feature_vector_column#

get_uk_grids#

get_uk_grids_outline#

get_global_grids#