You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Changing the name
* style + quality
* update doc and logo
* clean up
* circle-CI on the branche for now
* fix daily dialog dataset
* fix urls
Co-authored-by: Quentin Lhoest <[email protected]>
Copy file name to clipboardExpand all lines: CONTRIBUTING.md
+15-15
Original file line number
Diff line number
Diff line change
@@ -1,13 +1,13 @@
1
-
# How to contribute to nlp?
1
+
# How to contribute to Datasets?
2
2
3
-
1. Fork the [repository](https://github.com/huggingface/nlp) by clicking on the 'Fork' button on the repository's page. This creates a copy of the code under your GitHub user account.
3
+
1. Fork the [repository](https://github.com/huggingface/datasets) by clicking on the 'Fork' button on the repository's page. This creates a copy of the code under your GitHub user account.
4
4
5
5
2. Clone your fork to your local disk, and add the base repository as a remote:
3. Create a new branch to hold your development changes:
@@ -24,11 +24,11 @@
24
24
pip install -e ".[dev]"
25
25
```
26
26
27
-
(If nlp was already installed in the virtual environment, remove
28
-
it with `pip uninstall nlp` before reinstalling it in editable
27
+
(If datasets was already installed in the virtual environment, remove
28
+
it with `pip uninstall datasets` before reinstalling it in editable
29
29
mode with the `-e` flag.)
30
30
31
-
5. Develop the features on your branch. If you want to add a dataset see more in-detail intsructions in the section [*How to add a dataset*](#how-to-add-a-dataset). Alternatively, you can follow the steps to [add a dataset](https://huggingface.co/nlp/add_dataset.html) and [share a dataset](https://huggingface.co/nlp/share_dataset.html) in the documentation.
31
+
5. Develop the features on your branch. If you want to add a dataset see more in-detail intsructions in the section [*How to add a dataset*](#how-to-add-a-dataset). Alternatively, you can follow the steps to [add a dataset](https://huggingface.co/datasets/add_dataset.html) and [share a dataset](https://huggingface.co/datasets/share_dataset.html) in the documentation.
32
32
33
33
6. Format your code. Run black and isort so that your newly added files look nice with the following command:
34
34
@@ -60,20 +60,20 @@
60
60
8. Once you are satisfied, go the webpage of your fork on GitHub. Click on "Pull request" to send your to the project maintainers for review.
61
61
62
62
## How-To-Add a dataset
63
-
1. Make sure you followed steps 1-4 of the section [*How to contribute to nlp?*](#how-to-contribute-to-nlp).
63
+
1. Make sure you followed steps 1-4 of the section [*How to contribute to datasets?*](#how-to-contribute-to-datasets).
64
64
65
-
2. Create your dataset folder under `datasets/<your_dataset_name>` and create your dataset script under `datasets/<your_dataset_name>/<your_dataset_name>.py`. You can check out other dataset scripts under `datasets` for some inspiration. Note on naming: the dataset class should be camel case, while the dataset name is its snake case equivalent (ex: `class BookCorpus(nlp.GeneratorBasedBuilder)` for the dataset `book_corpus`).
65
+
2. Create your dataset folder under `datasets/<your_dataset_name>` and create your dataset script under `datasets/<your_dataset_name>/<your_dataset_name>.py`. You can check out other dataset scripts under `datasets` for some inspiration. Note on naming: the dataset class should be camel case, while the dataset name is its snake case equivalent (ex: `class BookCorpus(datasets.GeneratorBasedBuilder)` for the dataset `book_corpus`).
66
66
67
-
3.**Make sure you run all of the following commands from the root of your `nlp` git clone.** To check that your dataset works correctly and to create its `dataset_infos.json` file run the command:
67
+
3.**Make sure you run all of the following commands from the root of your `datasets` git clone.** To check that your dataset works correctly and to create its `dataset_infos.json` file run the command:
68
68
69
69
```bash
70
-
python nlp-cli test datasets/<your-dataset-folder> --save_infos --all_configs
70
+
python datasets-cli test datasets/<your-dataset-folder> --save_infos --all_configs
71
71
```
72
72
73
73
4. If the command was succesful, you should now create some dummy data. Use the following command to get in-detail instructions on how to create the dummy data:
6. If all tests pass, your dataset works correctly. Awesome! You can now follow steps 6, 7 and 8 of the section [*How to contribute to nlp?*](#how-to-contribute-to-nlp). If you experience problems with the dummy data tests, you might want to take a look at the section *Help for dummy data tests* below.
92
+
6. If all tests pass, your dataset works correctly. Awesome! You can now follow steps 6, 7 and 8 of the section [*How to contribute to 🤗Datasets?*](#how-to-contribute-to-🤗Datasets). If you experience problems with the dummy data tests, you might want to take a look at the section *Help for dummy data tests* below.
93
93
94
94
95
95
### Help for dummy data tests
@@ -98,7 +98,7 @@ Follow these steps in case the dummy data test keeps failing:
98
98
99
99
- Verify that all filenames are spelled correctly. Rerun the command
0 commit comments