Skip to content

SneaksAndData/hadoop-fs-wrapper

Repository files navigation

Hadoop FileSystem Java Class Wrapper

Code style: black

Typed Python wrappers for Hadoop FileSystem class family.

Installation

You can install this package from pypi on any Hadoop or Spark runtime:

pip install hadoop-fs-wrapper

Select a version that matches hadoop version you are using:

Hadoop Version / Spark version Compatible hadoop-fs-wrapper version
3.2.x / 3.2.x 0.4.x
3.3.x / 3.3.x 0.4.x, 0.5.x
3.3.x / 3.4.x 0.6.x
3.5.x / 3.5.x 0.7.x

Usage

Common use case is accessing Hadoop FileSystem from Spark session object:

from hadoop_fs_wrapper.wrappers.file_system import FileSystem

file_system = FileSystem.from_spark_session(spark=spark_session)

Then, for example, one can check if there are any files under specified path:

from hadoop_fs_wrapper.wrappers.file_system import FileSystem

def is_valid_source_path(file_system: FileSystem, path: str) -> bool:
    """
     Checks whether a regexp path refers to a valid set of paths
    :param file_system: pyHadooopWrapper FileSystem
    :param path: path e.g. (s3a|abfss|file|...)://[email protected]/path/part*.csv
    :return: true if path resolves to existing paths, otherwise false
    """
    return len(file_system.glob_status(path)) > 0

Contribution

Currently basic filesystem operations (listing, deleting, search, iterative listing etc.) are supported. If an operation you require is not yet wrapped, please open an issue or create a PR.

All changes are tested against Spark 3.4 running in local mode.

About

Python Wrappers for Hadoop FileSystem

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages