Skip to content

Task: Add subclasses to FeatureLayer #1143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
5 tasks
adamzev opened this issue Mar 25, 2025 · 1 comment
Open
5 tasks

Task: Add subclasses to FeatureLayer #1143

adamzev opened this issue Mar 25, 2025 · 1 comment
Assignees

Comments

@adamzev
Copy link
Collaborator

adamzev commented Mar 25, 2025

Make FeatureLayer a baseclass

Make FeatureLayer a baseclass with subclasses for each source type (EsriFeatureLayer, CartoFeatureLayer, GdfFeatureLayer). Each subclass will implement its own load_data method, while shared behavior like opa_join will remain on FeatureLayer which will be an abstract base class (stores shared functionality but isn't initialized directly).

Acceptance Criteria

  • Make FeatureLayer an abstract base class (FeatureLayer) with shared logic
  • Move specific logic to Ersi/Carto into the appropriate subclass
  • Subclasses implement load_data as appropriate
  • Eliminate the self.type and branching logic in FeatureLayer
  • All existing behavior remains unchanged

Additional context

  • This reduces branching and side effects in FeatureLayer.__init__.
  • Helps with long-term maintainability and the ability to add new layer types without having to change shared logic.
  • Clarifies which functions of FeatureLayer belongs with which source type.
@adamzev
Copy link
Collaborator Author

adamzev commented Apr 7, 2025

Rather than FeatureLayer subclasses, it may make sense to call the subclasses something like ErsiLoader or CartoLoader and restrict their functionality to API calls, column filtering, standardization and interaction with a separate Cache class.

In the services, we may need to just call an appropriate loader rather than a layer. The only concern there is that the FeatureLayer class keeps track of the Coordinate Reference System (CRS). If the GeoDataFrame reliably keeps its CRS that may not be an issue but if reload_gdf is needed, a layer or dataclass may be needed. If the reason we had to track CRS rather than use the GeoDataFrame's metadata relates to this ticket we may no longer need to since that bug is fixed.

I'm not ready to fully implement this yet, but here's what I was thinking:


class Loader(ABC):
    """
    Abstract base class for data sources.
    """

    def __init__(self, name, opa_col=None, load_on_init=True, cacher=Cacher):
        self.name = name
        self.cacher = cacher
        self.opa_col = opa_col
        if load_on_init:
            try:
                self.gdf = self.load_or_fetch()
            except Exception as e:
                log.error(f"Error loading data for {self.name}: {e}")
                traceback.print_exc()
                self.gdf = gpd.GeoDataFrame()  # Reset to an empty GeoDataFrame
                raise

    def load_or_fetch(self, mode: RunMode = run_mode) -> gpd.GeoDataFrame:
        cache_file = Cacher.get_cache_filename(self.name, mode)

        if mode == RunMode.CACHE_SMALL and not os.path.exists(cache_file):
            raise FileNotFoundError(
                f"Cache file {cache_file} not found. Please run with a different mode."
            )
        if mode == RunMode.FRESH_DATA or not os.path.exists(cache_file):
            gdf = self.load_data()
            if STORE_CACHE:
                self.cacher.store_cache(gdf, self.name, self.opa_col.lower())
            return gdf

        gdf = self.cacher.load_cache(self.name, mode)
        return gdf

    @abstractmethod
    def load_data(self):
        pass

    @staticmethod
    def convert_str_to_list(input_data):
        """
        Convert a string to a list if it's not already a list.
        Args:
            input_data (str or list): The input data to convert.
        """
        return [input_data] if isinstance(input_data, str) else input_data

    @staticmethod
    def lowercase_column_names(gdf):
        # Standardize column names
        if not gdf.empty:
            gdf.columns = [col.lower() for col in gdf.columns]
        return gdf

    @staticmethod
    def filter_columns(gdf, cols):
        # Filter columns if specified
        if cols:
            cols = [col.lower() for col in cols]
            cols.append("geometry")
            gdf = gdf[[col for col in cols if col in gdf.columns]]
        return gdf

    @classmethod
    def normalize_columns(cls, gdf, cols):
        """
        Normalize the columns of the GeoDataFrame to lowercase.
        """
        gdf = cls.lowercase_column_names(gdf)
        gdf = cls.filter_columns(gdf, cols)

        return gdf


class ErsiLoader(Loader):
    """
    Data source for ESRI REST API.
    """

    def __init__(self, name, urls, opa_col=None, cols: list[str] = None, from_xy=False):
        # if there's only one URL, make it a list
        self.urls = self.convert_str_to_list(urls)
        self.cols = cols
        self.crs = USE_CRS
        self.input_crs = "EPSG:4326" if not from_xy else USE_CRS
        super().__init__(name, opa_col=opa_col)

    def load_data(self):
        # Implement loading logic for ESRI data
        log.info(f"Loading data for {self.name} from ERSI...")
        gdf = load_esri_data(self.urls, self.input_crs, self.crs)
        gdf = self.normalize_columns(gdf, self.cols)
        return gdf


class CartoLoader(Loader):
    """
    Data source for Carto SQL queries.
    """

    def __init__(
        self,
        name,
        sql_queries,
        opa_col=None,
        chunk_size=100000,
        use_wkb_geom_field=None,
        from_xy=False,
        cols: list[str] = None,
    ):
        # if there's only one URL, make it a list
        self.sql_queries = self.convert_str_to_list(sql_queries)
        self.cols = cols
        self.max_workers = os.cpu_count()
        self.chunk_size = chunk_size
        self.use_wkb_geom_field = use_wkb_geom_field
        self.crs = USE_CRS
        self.input_crs = "EPSG:4326" if not from_xy else USE_CRS
        super().__init__(name, opa_col=opa_col)

    def load_data(self):
        # Implement loading logic for Carto data
        gdf = load_carto_data(
            self.sql_queries,
            self.max_workers,
            self.chunk_size,
            self.use_wkb_geom_field,
            self.input_crs,
            self.crs,
        )
        gdf = self.normalize_columns(gdf, self.cols)
        log.info(f"Loading data for {self.name} from Carto...")
        return gdf


class GdfLoader(Loader):
    """
    Data source for GeoDataFrames.
    """

    def __init__(self, name, gdf=None):
        self.gdf = gdf
        super().__init__(name)

    def load_data(self):
        # Implement loading logic for GeoDataFrames
        log.info(f"Loading data for {self.name} from GeoDataFrame...")

@cfreedman cfreedman moved this from To Do to In Development in Clean & Green Philly May 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Development
Development

No branches or pull requests

2 participants