Task: Add subclasses to FeatureLayer #1143

adamzev · 2025-03-25T14:49:25Z

Make `FeatureLayer` a baseclass

Make FeatureLayer a baseclass with subclasses for each source type (EsriFeatureLayer, CartoFeatureLayer, GdfFeatureLayer). Each subclass will implement its own load_data method, while shared behavior like opa_join will remain on FeatureLayer which will be an abstract base class (stores shared functionality but isn't initialized directly).

Acceptance Criteria

Make FeatureLayer an abstract base class (FeatureLayer) with shared logic
Move specific logic to Ersi/Carto into the appropriate subclass
Subclasses implement load_data as appropriate
Eliminate the self.type and branching logic in FeatureLayer
All existing behavior remains unchanged

Additional context

This reduces branching and side effects in FeatureLayer.__init__.
Helps with long-term maintainability and the ability to add new layer types without having to change shared logic.
Clarifies which functions of FeatureLayer belongs with which source type.

The text was updated successfully, but these errors were encountered:

adamzev · 2025-04-07T21:27:11Z

Rather than FeatureLayer subclasses, it may make sense to call the subclasses something like ErsiLoader or CartoLoader and restrict their functionality to API calls, column filtering, standardization and interaction with a separate Cache class.

In the services, we may need to just call an appropriate loader rather than a layer. The only concern there is that the FeatureLayer class keeps track of the Coordinate Reference System (CRS). If the GeoDataFrame reliably keeps its CRS that may not be an issue but if reload_gdf is needed, a layer or dataclass may be needed. If the reason we had to track CRS rather than use the GeoDataFrame's metadata relates to this ticket we may no longer need to since that bug is fixed.

I'm not ready to fully implement this yet, but here's what I was thinking:


class Loader(ABC):
    """
    Abstract base class for data sources.
    """

    def __init__(self, name, opa_col=None, load_on_init=True, cacher=Cacher):
        self.name = name
        self.cacher = cacher
        self.opa_col = opa_col
        if load_on_init:
            try:
                self.gdf = self.load_or_fetch()
            except Exception as e:
                log.error(f"Error loading data for {self.name}: {e}")
                traceback.print_exc()
                self.gdf = gpd.GeoDataFrame()  # Reset to an empty GeoDataFrame
                raise

    def load_or_fetch(self, mode: RunMode = run_mode) -> gpd.GeoDataFrame:
        cache_file = Cacher.get_cache_filename(self.name, mode)

        if mode == RunMode.CACHE_SMALL and not os.path.exists(cache_file):
            raise FileNotFoundError(
                f"Cache file {cache_file} not found. Please run with a different mode."
            )
        if mode == RunMode.FRESH_DATA or not os.path.exists(cache_file):
            gdf = self.load_data()
            if STORE_CACHE:
                self.cacher.store_cache(gdf, self.name, self.opa_col.lower())
            return gdf

        gdf = self.cacher.load_cache(self.name, mode)
        return gdf

    @abstractmethod
    def load_data(self):
        pass

    @staticmethod
    def convert_str_to_list(input_data):
        """
        Convert a string to a list if it's not already a list.
        Args:
            input_data (str or list): The input data to convert.
        """
        return [input_data] if isinstance(input_data, str) else input_data

    @staticmethod
    def lowercase_column_names(gdf):
        # Standardize column names
        if not gdf.empty:
            gdf.columns = [col.lower() for col in gdf.columns]
        return gdf

    @staticmethod
    def filter_columns(gdf, cols):
        # Filter columns if specified
        if cols:
            cols = [col.lower() for col in cols]
            cols.append("geometry")
            gdf = gdf[[col for col in cols if col in gdf.columns]]
        return gdf

    @classmethod
    def normalize_columns(cls, gdf, cols):
        """
        Normalize the columns of the GeoDataFrame to lowercase.
        """
        gdf = cls.lowercase_column_names(gdf)
        gdf = cls.filter_columns(gdf, cols)

        return gdf


class ErsiLoader(Loader):
    """
    Data source for ESRI REST API.
    """

    def __init__(self, name, urls, opa_col=None, cols: list[str] = None, from_xy=False):
        # if there's only one URL, make it a list
        self.urls = self.convert_str_to_list(urls)
        self.cols = cols
        self.crs = USE_CRS
        self.input_crs = "EPSG:4326" if not from_xy else USE_CRS
        super().__init__(name, opa_col=opa_col)

    def load_data(self):
        # Implement loading logic for ESRI data
        log.info(f"Loading data for {self.name} from ERSI...")
        gdf = load_esri_data(self.urls, self.input_crs, self.crs)
        gdf = self.normalize_columns(gdf, self.cols)
        return gdf


class CartoLoader(Loader):
    """
    Data source for Carto SQL queries.
    """

    def __init__(
        self,
        name,
        sql_queries,
        opa_col=None,
        chunk_size=100000,
        use_wkb_geom_field=None,
        from_xy=False,
        cols: list[str] = None,
    ):
        # if there's only one URL, make it a list
        self.sql_queries = self.convert_str_to_list(sql_queries)
        self.cols = cols
        self.max_workers = os.cpu_count()
        self.chunk_size = chunk_size
        self.use_wkb_geom_field = use_wkb_geom_field
        self.crs = USE_CRS
        self.input_crs = "EPSG:4326" if not from_xy else USE_CRS
        super().__init__(name, opa_col=opa_col)

    def load_data(self):
        # Implement loading logic for Carto data
        gdf = load_carto_data(
            self.sql_queries,
            self.max_workers,
            self.chunk_size,
            self.use_wkb_geom_field,
            self.input_crs,
            self.crs,
        )
        gdf = self.normalize_columns(gdf, self.cols)
        log.info(f"Loading data for {self.name} from Carto...")
        return gdf


class GdfLoader(Loader):
    """
    Data source for GeoDataFrames.
    """

    def __init__(self, name, gdf=None):
        self.gdf = gdf
        super().__init__(name)

    def load_data(self):
        # Implement loading logic for GeoDataFrames
        log.info(f"Loading data for {self.name} from GeoDataFrame...")

nlebovits · 2025-06-14T22:14:45Z

closed by #1226

adamzev self-assigned this Mar 25, 2025

github-project-automation bot added this to Clean & Green Philly Mar 25, 2025

cfreedman added backend python labels Mar 27, 2025

adamzev mentioned this issue Mar 30, 2025

Postgres and FORCE_RELOAD issues in the new pipeline #1152

Closed

adamzev mentioned this issue Apr 17, 2025

Task: Add unit testing to the various services #1178

Open

3 tasks

cfreedman moved this to To Do in Clean & Green Philly May 17, 2025

cfreedman moved this from To Do to In Development in Clean & Green Philly May 21, 2025

nlebovits closed this as completed Jun 14, 2025

github-project-automation bot moved this from In Development to Live in Clean & Green Philly Jun 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Task: Add subclasses to FeatureLayer #1143

Task: Add subclasses to FeatureLayer #1143

adamzev commented Mar 25, 2025 •

edited

Loading

adamzev commented Apr 7, 2025

Uh oh!

nlebovits commented Jun 14, 2025

Uh oh!

Task: Add subclasses to FeatureLayer #1143

Task: Add subclasses to FeatureLayer #1143

Comments

adamzev commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Make FeatureLayer a baseclass

Acceptance Criteria

Additional context

adamzev commented Apr 7, 2025

Uh oh!

nlebovits commented Jun 14, 2025

Uh oh!

adamzev commented Mar 25, 2025 •

edited

Loading

Make `FeatureLayer` a baseclass