More edits.

arokem · arokem · commit 2568099f3bcb · 2024-05-16T14:05:29.000-07:00
diff --git a/sections/01-introduction.qmd b/sections/01-introduction.qmd
@@ -9,54 +9,63 @@ machine learning techniques, these datasets can help us understand everything
 from the cellular operations of the human body, through business transactions
 on the internet, to the structure and history of the universe. However, the
 development of new machine learning methods, and data-intensive discovery more
-generally, rely heavily on the availability and usability of these large
-datasets. Data can be openly available but still not useful if it cannot be
-properly understood. In current conditions in which almost all of the relevant
-data is stored in digital formats, and many relevant datasets can be found
-through the communication networks of the world wide web, Findability,
-Accessibility, Interoperability and Reusability (FAIR) principles for data
-management and stewardship become critically important
-\cite{Wilkinson2016FAIR}.
+generally, rely heavily on Findability, Accessibility, Interoperability and
+Reusability (FAIR) of data [@Wilkinson2016FAIR].
 
-One of the main mechanisms through which these principles are promoted is the
-development of \emph{standards} for data and metadata. Standards can vary in
-the level of detail and scope, and encompass such things as \emph{file formats}
-for the storing of certain data types, \emph{schemas} for databases that store
-a range of data types, \emph{ontologies} to describe and organize metadata in a
+One of the main mechanisms through which the FAIR principles are promoted is the
+development of *standards* for data and metadata. Standards can vary in
+the level of detail and scope, and encompass such things as *file formats*
+for the storing of certain data types, *schemas* for databases that store
+a range of data types, *ontologies* to describe and organize metadata in a
 manner that connects it to field-specific meaning, as well as mechanisms to
-describe \emph{provenance} of different data derivatives. The importance of
-standards was underscored in a recent report report by the Subcommittee on Open
-Science of the National Science and Technology Council on "Desirable
-characteristics of data repositories for federally funded research"
-\cite{nstc2022desirable}. The report explicitly called out the importance of
-"allow[ing] datasets and metadata to be accessed, downloaded, or exported from
-the repository in widely used, preferably non-proprietary, formats consistent
-with standards used in the disciplines the repository serves." This highlights
-the need for data and metadata standards across a variety of different kinds of
-data. In addition, a report from the National Institute of Standards and
-Technology on "U.S. Leadership in AI: A Plan for Federal Engagement in
-Developing Technical Standards and Related Tools" emphasized that --
-specifically for the case of AI -- "U.S. government agencies should prioritize
-AI standards efforts that are [...] Consensus-based, [...] Inclusive and
-accessible, [...] Multi-path, [...] Open and transparent, [...] and [that]
-Result in globally relevant and non-discriminatory standards..."
-\cite{NIST2019}. The converging characteristics of standards that arise from
-these reports suggest that considerable thought needs to be given to the manner
-in which standards arise, so that these goals are achieved.
+describe *provenance* of analysis products.
 
-Standards for a specific domain can come about in various ways, but very
-broadly speaking two kinds of mechanisms can generate a standard for a specific
-type of data: (i) top-down: in this case a (usually) small group of people
-develop the standard and disseminate it to the communities of interest with
-very little input from these communities. An example of this mode of standards
-development can occur when an instrument is developed by a manufacturer and
-users of this instrument receive the data in a particular format that was
-developed in tandem with the instrument; and (ii) bottom-up: in this case,
-standards are developed by a larger group of people that convene and reach
-consensus about the details of the standard in an attempt to cover a large
-range of use-cases. Most standards are developed through an interplay between
-these two modes, and understanding how to make the best of these modes is
-critical in advancing the development of data and metadata standards.
+The importance of standards stems not only from discussions within research
+fields about how research can best be conducted to take advantage of existing
+and growing datasets, but also arises from an ongoing series of policy
+discussions that address the interactions between research communities and the
+general public. In the United States, memos issued in 2013 and 2022 by the
+directors of the White House Office of Science and Technology Policy (OSTP),
+James Holdren (2013) and Alondra Nelson (2022). While these memos focused
+primarily on making peer-reviewed publications funded by the US Federal
+government available to the general public, they also lay an increasingly
+detailed path towards the publication and general availability of the data that
+is collected as part of the research that is funded by the US government.
+
+The general guidance and overall spirit of these memos dovetail with more
+specific policy discussions that put meat on the bones of the general guidance.
+The importance of data and metadata standards, for example, was underscored in
+a recent report by the Subcommittee on Open Science of the National Science and
+Technology Council on the "Desirable characteristics of data repositories for
+federally funded research" [@nstc2022desirable]. The report explicitly called
+out the importance of "allow[ing] datasets and metadata to be accessed,
+downloaded, or exported from the repository in widely used, preferably
+non-proprietary, formats consistent with standards used in the disciplines the
+repository serves." This highlights the need for data and metadata standards
+across a variety of different kinds of data. In addition, a report from the
+National Institute of Standards and Technology on "U.S. Leadership in AI: A
+Plan for Federal Engagement in Developing Technical Standards and Related
+Tools" emphasized that -- specifically for the case of AI -- "U.S. government
+agencies should prioritize AI standards efforts that are [...] Consensus-based,
+[...] Inclusive and accessible, [...] Multi-path, [...] Open and transparent,
+[...] and [that] Result in globally relevant and non-discriminatory
+standards..." [@NIST2019]. The converging characteristics of standards that
+arise from these reports suggest that considerable thought needs to be given to
+the manner in which standards arise, so that these goals are achieved.
+
+Standards for a specific domain can come about in various ways. Broadly
+speaking two kinds of mechanisms can generate a standard for a specific type of
+data: (i) top-down: in this case a (usually) small group of people develop the
+standard and disseminate it to the communities of interest with very little
+input from these communities. An example of this mode of standards development
+can occur when an instrument is developed by a manufacturer and users of this
+instrument receive the data in a particular format that was developed in tandem
+with the instrument; and (ii) bottom-up: in this case, standards are developed
+by a larger group of people that convene and reach consensus about the details
+of the standard in an attempt to cover a large range of use-cases. Most
+standards are developed through an interplay between these two modes, and
+understanding how to make the best of these modes is critical in advancing the
+development of data and metadata standards.
 
 One source of inspiration for bottom-up development of robust, adaptable and
 useful standards comes from open-source software (OSS). OSS has a long history
diff --git a/sections/03-recommendations.qmd b/sections/03-recommendations.qmd
@@ -1,5 +1,6 @@
 
 
+<<<<<<< HEAD
 ## Funding or Grantmaking entities: 
 
 ### Fund Data Standards Development
@@ -57,5 +58,19 @@ Development of standards should be coupled with development of associated softwa
 Additionally, standards evolution should maintain software compatibility, and ability to translate and migrate between standards. 
 
 
+=======
+1. Training for data stewards and career paths that encourage this role.
+2. Development of meta-standards or standards-of-standards. These are descriptions of cross-cutting best practices. These can be used as a basis of the analysis or assessment of an existing standard, or as guidelines to develop new standards.
+3. Recommend pathways or lifecycles for successful data standards. Include process, creators, affiliations, grants, and adoption journeys. Make this documentation step integral to the work of standards creators and granting agencies.
+4. Retrocactively document #3 for standards such as CF(climate science), NASA genelab (space omics), OpenGIS (geospatial), DICOM (medical imaging), GA4GH (genomics), FITS (astronomy), Zarr (domain agnostic n-dimensional arrays)... ?
+5. Create ontology for standards process such as top down vs bottom up, minimum number of datasets, and community size. Examine schema.org (w3c), PEP (Python), CDISC (FDA).
+6. Amplify formalization/guidelines on how to create standards (example metadata schema specifications using https://linkml.io).
+7. Make data standards machine readable, and software creation an integral part of establishing a standard's schema e.g. identifiers for a person using CFF in citations. cffconvert software makes the CFF standard usable and useful.
+8. Survey and document failure of current standards for a specific dataset / domain before establishing a new one. Use resources such as Fairsharing.org or Digital Curation Center https://www.dcc.ac.uk/guidance/standards.
+9. Funding agencies and science communities need to establish governance for standards creation and adoption (cite https://www.theopensourceway.org/the_open_source_way-guidebook-2.0.html#_project_and_community_governance).
+10. Cross sector alliances such as industry - academia need closer coordination and algnment of pace through strong program management (for instance via OSPO efforts).
+11. Multi company partnerships should include strategic initiatives for standard establishment (example https://www.pistoiaalliance.org/news/press-release-pistoia-alliance-launches-idmp-1-0/).
+12. Stakeholder organizations should invest in training grants to establish curriculum for data and metadata standards education.
+>>>>>>> 8cb3f6b (More edits.)