A Delta Lake reader for Dask
-
Updated
Oct 2, 2024 - Python
A Delta Lake reader for Dask
Code for preprocessing data from the HEXTOF instrument at FLASH, DESY in Hamburg (DE)
Dask tutorial;Dask汉化教程
Comparison of Dataframe libraries for parallel processing of large tabular files on CPU and GPU.
Flexible stacked visualization of circadian data from multiple sources and devices
Code for a talk on wrangling large datasets in pandas
Full data analysis and data visualization projects notebooks using Pandas, Numpy, matplotlib and seaborn
Data Analysis on an extensive dataset of crimes in Chicago (2005 - 2016) using Dask
This repository develops an advanced recommendation system to enhance the e-commerce shopping experience by automating product suggestions and analyzing user preferences through machine learning techniques and big data technologies.
Sumeh — Unified Data Quality Framework Sumeh is a unified data quality validation framework supporting multiple backends (PySpark, Dask, Polars, DuckDB) with centralized rule configuration.
This is a Time Series Forecasting and Regression solution to project the no. of pick-ups at and around a given region at a given time in the city of New York, USA.
Data Analysis on an extensive dataset of crimes in Chicago (2005-2016) using Dask
The following project shows and compares machine learning between Pandas DataFrames and Dask Dataframes.
A tutorial to learn Dask DataArray and Dask DataFrames with examples from geospatial data catalogs.
Training Higgs Dataset with Keras - https://doi.org/10.5281/zenodo.13133945
using dask geopandas to process large vector dataset
POCs in order to explore new technologies.
This is a project of distributed computing using Dask.
Add a description, image, and links to the dask-dataframes topic page so that developers can more easily learn about it.
To associate your repository with the dask-dataframes topic, visit your repo's landing page and select "manage topics."