Merge pull request #11 from worldbank/add_guidance_flagships

mariarrt94 · web-flow · commit 100813cf4a5c · 2025-04-23T14:26:18.000-04:00
add: comments by Ankriti
diff --git a/guidance/step_by_step_flagships.md b/guidance/step_by_step_flagships.md
@@ -18,8 +18,8 @@ A reproducibility package includes everything needed to replicate the findings i
 | Includes | Details |
 |----------|---------|
 | 📑 **Documentation** | README, Data Availability Statement (DAS), figure/table mapping |
-| 📂 **Code** | All needed scripts to clean data and generate final outputs |
-| 📊 **Data** | Raw data and/or detailed access instructions to obtain it |
+| 📂 **Code** | All code files required to go from the original data to the results in the paper |
+| 📊 **Data** | All raw data needed and/or detailed access instructions to obtain it |
 
 > 💻 **Standard: Computational Reproducibility**  
 > A third party can reproduce the **exact findings** in the paper using the data, code, and documentation provided by the author.
@@ -39,18 +39,17 @@ A reproducibility package includes everything needed to replicate the findings i
 
 ---
 
-## 📦 Components of a Good Reproducibility Package — *With Flagships in Mind*
+## 📦 Components of a Good Reproducibility Package — *Flagships*
 
-📌 **Flagship projects typically involve multiple datasets, chapters, and contributors**, which makes reproducibility more complex. Below are the key components of a strong package, along with specific tips for flagships to ensure clarity and coordination.
+📌 **Flagship projects typically involve multiple datasets, chapters, and contributors**, which adds complexity to reproducibility. The table below outlines the essential components of a high-quality package, with specific tips to support coordination and transparency in flagship workflows.
 
 | Component             | Description & Flagship-Specific Tips                                                                 |
 |----------------------|-------------------------------------------------------------------------------------------------------|
-| **README File**       | 📌 **Essential for flagships**: Provides a clear overview of the analysis and full guidance for replicators.<br> - Include step-by-step instructions on how to run the code or replicate the findings. <br>- Include a **list of exhibits**: indicate which are produced by the package and which come from external sources (with citations).<br>- Data Availability Statement: define **all datasets used**: include source (with URL), version, and access date (more on next point).<br><br>🔗 **Use our templates**: [Markdown](https://github.com/worldbank/wb-reproducible-research-repository/blob/main/resources/README_Template.md) · [Word](https://github.com/worldbank/wb-reproducible-research-repository/raw/refs/heads/main/resources/README_Template.docx) |
-| **Data Availability Statement (DAS)** | 📌 **Flagships often use a mix of public, restricted, and internal datasets**, so this is a key component. The DAS must:<br><br>- List **every dataset** used in the analysis, regardless of size or access level.<br>- Specify **access conditions** for each (e.g., public with URL, WB staff only with explanation of how data was obtained, etc.)<br><br>🔗 [Example DAS for Flagship](https://reproducibility.worldbank.org/index.php/catalog/250/download/731) |
-| **Code Files**        | - 📁 Modular scripts organized by task (e.g., `cleaning.R`, `analysis.do`) and managed via one main script (`main.R` or `main.do`).<br>- List all external dependencies (e.g., R packages, ado files, Python libraries).<br>- 📌 For flagships: organize code **by chapter or module**, and ensure the full team agrees on versioning and folder structure. |
-| **Data**              | - Keep **raw** and **processed** data separate.<br>- Document all data transformations **in code**. If manual edits were made, explain them in the README.<br>- Clean the package before submission: remove unused files.<br>📌 For flagships: ensure **consistent data versions** across chapters and authors. Store original data in permanent, team-accessible locations. |
-| **Final Outputs**     | - Include **all raw outputs** used in the report (e.g., CSVs, graphs, LaTeX tables).<br>- 📌 If final outputs were sent to the design team, **document which ones**.<br>- A final verification run should confirm that report exhibits match raw outputs. |
-
+| **README File**       | 📌 **Critical for flagships**: Serves as the main guide for replicators.<br><br>— Provide step-by-step instructions for how to run the code and reproduce results.<br>— Include a **list of exhibits**, indicating which are generated by the package and which are taken from external sources (with citations).<br>— Include a Data Availability Statement (see below).<br>— If the project structure is complex (e.g., organized by chapter or module), describe the folder layout to help others navigate it.<br><br>🔗 **Use our templates**: [Markdown](https://github.com/worldbank/wb-reproducible-research-repository/blob/main/resources/README_Template.md) · [Word](https://github.com/worldbank/wb-reproducible-research-repository/raw/refs/heads/main/resources/README_Template.docx) |
+| **Data Availability Statement (DAS)** | 📌 **Essential for flagships**: These often use a mix of public, restricted, and internal datasets.<br><br>— List **every dataset** used, regardless of size or access level.<br>— Clearly describe the **access conditions** for each dataset: e.g., public (include URL), restricted (how the team obtained it), or internal WB access only (with process and a contact name if possible).<br>— Include the **access date**, since datasets may be updated before project completion.<br><br>🔗 [Example DAS for Flagship](https://reproducibility.worldbank.org/index.php/catalog/250/download/731) |
+| **Code Files**        | 📁 Organize scripts by task (e.g., `cleaning.R`, `analysis.do`), and manage them with a single main script (`main.R`, `main.do`, or equivalent).<br><br>— List all dependencies explicitly (e.g., R packages, ado files, Python libraries).<br>📌 For flagships: Use a **modular structure by chapter or module**, and agree on folder naming and structure across all contributors. <br><br>🔗 **Use our templates**: [Stata](https://github.com/worldbank/wb-reproducible-research-repository/blob/main/resources/main.do) · [R](https://github.com/worldbank/wb-reproducible-research-repository/blob/main/resources/main.R)|
+| **Data**              | 📌 Data is often the trickiest part for flagships due to mmultiple sources.<br><br>— Keep **raw** and **processed** data in separate folders.<br>— Document **all** data transformations **in code**. If manual edits were made, explain them in the README.<br>— Remove any unused datasets before submission.<br>— If using internally produced data (e.g., from other WB teams), provide as much detail as possible: include dataset title, source team, contact person (if applicable), and whether it could be shared on DDH/MDL under a restricted license.<br>📌 Maintain consistent dataset versions across chapters and authors, and store original data in permanent, team-accessible locations. |
+| **Final Outputs**     | 📤 Include all raw outputs used in the paper (e.g., CSVs, LaTeX tables, plots).<br><br>📌 If any outputs were sent to the design/publication team, specify which ones to avoid mismatches between the paper and the package. |
 
 ---