You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| 📂 **Code**| All needed scripts to clean data and generate final outputs|
22
-
| 📊 **Data**|Raw data and/or detailed access instructions to obtain it |
21
+
| 📂 **Code**| All code files required to go from the original data to the results in the paper|
22
+
| 📊 **Data**|All raw data needed and/or detailed access instructions to obtain it |
23
23
24
24
> 💻 **Standard: Computational Reproducibility**
25
25
> A third party can reproduce the **exact findings** in the paper using the data, code, and documentation provided by the author.
@@ -39,18 +39,17 @@ A reproducibility package includes everything needed to replicate the findings i
39
39
40
40
---
41
41
42
-
## 📦 Components of a Good Reproducibility Package — *With Flagships in Mind*
42
+
## 📦 Components of a Good Reproducibility Package — *Flagships*
43
43
44
-
📌 **Flagship projects typically involve multiple datasets, chapters, and contributors**, which makes reproducibility more complex. Below are the key components of a strong package, along with specific tips for flagships to ensure clarity and coordination.
44
+
📌 **Flagship projects typically involve multiple datasets, chapters, and contributors**, which adds complexity to reproducibility. The table below outlines the essential components of a high-quality package, with specific tips to support coordination and transparency in flagship workflows.
|**README File**| 📌 **Essential for flagships**: Provides a clear overview of the analysis and full guidance for replicators.<br> - Include step-by-step instructions on how to run the code or replicate the findings. <br>- Include a **list of exhibits**: indicate which are produced by the package and which come from external sources (with citations).<br>- Data Availability Statement: define **all datasets used**: include source (with URL), version, and access date (more on next point).<br><br>🔗 **Use our templates**: [Markdown](https://github.com/worldbank/wb-reproducible-research-repository/blob/main/resources/README_Template.md) · [Word](https://github.com/worldbank/wb-reproducible-research-repository/raw/refs/heads/main/resources/README_Template.docx)|
49
-
|**Data Availability Statement (DAS)**| 📌 **Flagships often use a mix of public, restricted, and internal datasets**, so this is a key component. The DAS must:<br><br>- List **every dataset** used in the analysis, regardless of size or access level.<br>- Specify **access conditions** for each (e.g., public with URL, WB staff only with explanation of how data was obtained, etc.)<br><br>🔗 [Example DAS for Flagship](https://reproducibility.worldbank.org/index.php/catalog/250/download/731)|
50
-
|**Code Files**| - 📁 Modular scripts organized by task (e.g., `cleaning.R`, `analysis.do`) and managed via one main script (`main.R` or `main.do`).<br>- List all external dependencies (e.g., R packages, ado files, Python libraries).<br>- 📌 For flagships: organize code **by chapter or module**, and ensure the full team agrees on versioning and folder structure. |
51
-
|**Data**| - Keep **raw** and **processed** data separate.<br>- Document all data transformations **in code**. If manual edits were made, explain them in the README.<br>- Clean the package before submission: remove unused files.<br>📌 For flagships: ensure **consistent data versions** across chapters and authors. Store original data in permanent, team-accessible locations. |
52
-
|**Final Outputs**| - Include **all raw outputs** used in the report (e.g., CSVs, graphs, LaTeX tables).<br>- 📌 If final outputs were sent to the design team, **document which ones**.<br>- A final verification run should confirm that report exhibits match raw outputs. |
53
-
48
+
|**README File**| 📌 **Critical for flagships**: Serves as the main guide for replicators.<br><br>— Provide step-by-step instructions for how to run the code and reproduce results.<br>— Include a **list of exhibits**, indicating which are generated by the package and which are taken from external sources (with citations).<br>— Include a Data Availability Statement (see below).<br>— If the project structure is complex (e.g., organized by chapter or module), describe the folder layout to help others navigate it.<br><br>🔗 **Use our templates**: [Markdown](https://github.com/worldbank/wb-reproducible-research-repository/blob/main/resources/README_Template.md) · [Word](https://github.com/worldbank/wb-reproducible-research-repository/raw/refs/heads/main/resources/README_Template.docx)|
49
+
|**Data Availability Statement (DAS)**| 📌 **Essential for flagships**: These often use a mix of public, restricted, and internal datasets.<br><br>— List **every dataset** used, regardless of size or access level.<br>— Clearly describe the **access conditions** for each dataset: e.g., public (include URL), restricted (how the team obtained it), or internal WB access only (with process and a contact name if possible).<br>— Include the **access date**, since datasets may be updated before project completion.<br><br>🔗 [Example DAS for Flagship](https://reproducibility.worldbank.org/index.php/catalog/250/download/731)|
50
+
|**Code Files**| 📁 Organize scripts by task (e.g., `cleaning.R`, `analysis.do`), and manage them with a single main script (`main.R`, `main.do`, or equivalent).<br><br>— List all dependencies explicitly (e.g., R packages, ado files, Python libraries).<br>📌 For flagships: Use a **modular structure by chapter or module**, and agree on folder naming and structure across all contributors. <br><br>🔗 **Use our templates**: [Stata](https://github.com/worldbank/wb-reproducible-research-repository/blob/main/resources/main.do) · [R](https://github.com/worldbank/wb-reproducible-research-repository/blob/main/resources/main.R)|
51
+
|**Data**| 📌 Data is often the trickiest part for flagships due to mmultiple sources.<br><br>— Keep **raw** and **processed** data in separate folders.<br>— Document **all** data transformations **in code**. If manual edits were made, explain them in the README.<br>— Remove any unused datasets before submission.<br>— If using internally produced data (e.g., from other WB teams), provide as much detail as possible: include dataset title, source team, contact person (if applicable), and whether it could be shared on DDH/MDL under a restricted license.<br>📌 Maintain consistent dataset versions across chapters and authors, and store original data in permanent, team-accessible locations. |
52
+
|**Final Outputs**| 📤 Include all raw outputs used in the paper (e.g., CSVs, LaTeX tables, plots).<br><br>📌 If any outputs were sent to the design/publication team, specify which ones to avoid mismatches between the paper and the package. |
0 commit comments