Skip to content

Commit 100813c

Browse files
authored
Merge pull request #11 from worldbank/add_guidance_flagships
add: comments by Ankriti
2 parents 6dc947a + 8e111b5 commit 100813c

File tree

1 file changed

+9
-10
lines changed

1 file changed

+9
-10
lines changed

guidance/step_by_step_flagships.md

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,8 @@ A reproducibility package includes everything needed to replicate the findings i
1818
| Includes | Details |
1919
|----------|---------|
2020
| 📑 **Documentation** | README, Data Availability Statement (DAS), figure/table mapping |
21-
| 📂 **Code** | All needed scripts to clean data and generate final outputs |
22-
| 📊 **Data** | Raw data and/or detailed access instructions to obtain it |
21+
| 📂 **Code** | All code files required to go from the original data to the results in the paper |
22+
| 📊 **Data** | All raw data needed and/or detailed access instructions to obtain it |
2323

2424
> 💻 **Standard: Computational Reproducibility**
2525
> A third party can reproduce the **exact findings** in the paper using the data, code, and documentation provided by the author.
@@ -39,18 +39,17 @@ A reproducibility package includes everything needed to replicate the findings i
3939

4040
---
4141

42-
## 📦 Components of a Good Reproducibility Package — *With Flagships in Mind*
42+
## 📦 Components of a Good Reproducibility Package — *Flagships*
4343

44-
📌 **Flagship projects typically involve multiple datasets, chapters, and contributors**, which makes reproducibility more complex. Below are the key components of a strong package, along with specific tips for flagships to ensure clarity and coordination.
44+
📌 **Flagship projects typically involve multiple datasets, chapters, and contributors**, which adds complexity to reproducibility. The table below outlines the essential components of a high-quality package, with specific tips to support coordination and transparency in flagship workflows.
4545

4646
| Component | Description & Flagship-Specific Tips |
4747
|----------------------|-------------------------------------------------------------------------------------------------------|
48-
| **README File** | 📌 **Essential for flagships**: Provides a clear overview of the analysis and full guidance for replicators.<br> - Include step-by-step instructions on how to run the code or replicate the findings. <br>- Include a **list of exhibits**: indicate which are produced by the package and which come from external sources (with citations).<br>- Data Availability Statement: define **all datasets used**: include source (with URL), version, and access date (more on next point).<br><br>🔗 **Use our templates**: [Markdown](https://github.com/worldbank/wb-reproducible-research-repository/blob/main/resources/README_Template.md) · [Word](https://github.com/worldbank/wb-reproducible-research-repository/raw/refs/heads/main/resources/README_Template.docx) |
49-
| **Data Availability Statement (DAS)** | 📌 **Flagships often use a mix of public, restricted, and internal datasets**, so this is a key component. The DAS must:<br><br>- List **every dataset** used in the analysis, regardless of size or access level.<br>- Specify **access conditions** for each (e.g., public with URL, WB staff only with explanation of how data was obtained, etc.)<br><br>🔗 [Example DAS for Flagship](https://reproducibility.worldbank.org/index.php/catalog/250/download/731) |
50-
| **Code Files** | - 📁 Modular scripts organized by task (e.g., `cleaning.R`, `analysis.do`) and managed via one main script (`main.R` or `main.do`).<br>- List all external dependencies (e.g., R packages, ado files, Python libraries).<br>- 📌 For flagships: organize code **by chapter or module**, and ensure the full team agrees on versioning and folder structure. |
51-
| **Data** | - Keep **raw** and **processed** data separate.<br>- Document all data transformations **in code**. If manual edits were made, explain them in the README.<br>- Clean the package before submission: remove unused files.<br>📌 For flagships: ensure **consistent data versions** across chapters and authors. Store original data in permanent, team-accessible locations. |
52-
| **Final Outputs** | - Include **all raw outputs** used in the report (e.g., CSVs, graphs, LaTeX tables).<br>- 📌 If final outputs were sent to the design team, **document which ones**.<br>- A final verification run should confirm that report exhibits match raw outputs. |
53-
48+
| **README File** | 📌 **Critical for flagships**: Serves as the main guide for replicators.<br><br>— Provide step-by-step instructions for how to run the code and reproduce results.<br>— Include a **list of exhibits**, indicating which are generated by the package and which are taken from external sources (with citations).<br>— Include a Data Availability Statement (see below).<br>— If the project structure is complex (e.g., organized by chapter or module), describe the folder layout to help others navigate it.<br><br>🔗 **Use our templates**: [Markdown](https://github.com/worldbank/wb-reproducible-research-repository/blob/main/resources/README_Template.md) · [Word](https://github.com/worldbank/wb-reproducible-research-repository/raw/refs/heads/main/resources/README_Template.docx) |
49+
| **Data Availability Statement (DAS)** | 📌 **Essential for flagships**: These often use a mix of public, restricted, and internal datasets.<br><br>— List **every dataset** used, regardless of size or access level.<br>— Clearly describe the **access conditions** for each dataset: e.g., public (include URL), restricted (how the team obtained it), or internal WB access only (with process and a contact name if possible).<br>— Include the **access date**, since datasets may be updated before project completion.<br><br>🔗 [Example DAS for Flagship](https://reproducibility.worldbank.org/index.php/catalog/250/download/731) |
50+
| **Code Files** | 📁 Organize scripts by task (e.g., `cleaning.R`, `analysis.do`), and manage them with a single main script (`main.R`, `main.do`, or equivalent).<br><br>— List all dependencies explicitly (e.g., R packages, ado files, Python libraries).<br>📌 For flagships: Use a **modular structure by chapter or module**, and agree on folder naming and structure across all contributors. <br><br>🔗 **Use our templates**: [Stata](https://github.com/worldbank/wb-reproducible-research-repository/blob/main/resources/main.do) · [R](https://github.com/worldbank/wb-reproducible-research-repository/blob/main/resources/main.R)|
51+
| **Data** | 📌 Data is often the trickiest part for flagships due to mmultiple sources.<br><br>— Keep **raw** and **processed** data in separate folders.<br>— Document **all** data transformations **in code**. If manual edits were made, explain them in the README.<br>— Remove any unused datasets before submission.<br>— If using internally produced data (e.g., from other WB teams), provide as much detail as possible: include dataset title, source team, contact person (if applicable), and whether it could be shared on DDH/MDL under a restricted license.<br>📌 Maintain consistent dataset versions across chapters and authors, and store original data in permanent, team-accessible locations. |
52+
| **Final Outputs** | 📤 Include all raw outputs used in the paper (e.g., CSVs, LaTeX tables, plots).<br><br>📌 If any outputs were sent to the design/publication team, specify which ones to avoid mismatches between the paper and the package. |
5453

5554
---
5655

0 commit comments

Comments
 (0)