-
Notifications
You must be signed in to change notification settings - Fork 245
Pull requests: modelscope/data-juicer
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
use black
dj:core
issues/PRs about the core functions of Data-Juicer
environment
related to third-party dependency, DJ-pypi, DJ-docker, etc.
#714
opened Jun 20, 2025 by
cyruszhang
Loading…
Fix sandbox left bugs
bug
Something isn't working
dj:cookbook
useful recipes and demos
enhancement
New feature or request
#710
opened Jun 19, 2025 by
HYLcool
Loading…
[NewOp] Add generate_challenging_qa_mapper based on MindGYM principles
#703
opened Jun 14, 2025 by
Bat-Reality
Loading…
[WIP] Optimization framework
dj:core
issues/PRs about the core functions of Data-Juicer
dj:efficiency
regarding to efficiency issues and enhancements
#702
opened Jun 13, 2025 by
cyruszhang
Loading…
[NewOp] Add domain_diversity_selector based on DaaR principles
#699
opened Jun 12, 2025 by
lingzhq
Loading…
MinHash calculation with GPU on Ray
dj:dist
issues/PRs about distributed data processing
dj:efficiency
regarding to efficiency issues and enhancements
dj:op
issues/PRs about some specific OPs
dj:tools
issues/PRs about specific tools
enhancement
New feature or request
#694
opened Jun 9, 2025 by
cyruszhang
Loading…
Add
RayBTSMinhashDeduplicatorWithUid
and DocumentMinhashDeduplicatorWithUid
.
#677
opened May 22, 2025 by
chenyushuo
Loading…
[Tools]Optimize data_resplit, upload a tool to convert parquet to jsonl
#676
opened May 20, 2025 by
liuyuhanalex
Loading…
Optimize dedup to avoid oom
dj:dist
issues/PRs about distributed data processing
dj:efficiency
regarding to efficiency issues and enhancements
dj:tools
issues/PRs about specific tools
enhancement
New feature or request
good first issue
Good for newcomers
#568
opened Feb 7, 2025 by
coolderli
Loading…
Add humanvbench operators
dj:multimodal
issues/PRs about multimodal data processing
dj:op
issues/PRs about some specific OPs
good first issue
Good for newcomers
#553
opened Jan 17, 2025 by
SYSUzhouting
Loading…
Add minhash deduplicator based on RAY and Redis
dj:dist
issues/PRs about distributed data processing
dj:efficiency
regarding to efficiency issues and enhancements
dj:op
issues/PRs about some specific OPs
#489
opened Nov 15, 2024 by
pan-x-c
Loading…
[WIP]Add text tagging by prompt mapper op
dj:op
issues/PRs about some specific OPs
#408
opened Aug 30, 2024 by
garyzhang99
Loading…
1 task
Add GPT-4V as evaluator
dj:multimodal
issues/PRs about multimodal data processing
enhancement
New feature or request
stale-pr
ProTip!
Add no:assignee to see everything that’s not assigned.