Skip to content

Commit 0a10433

Browse files
committed
edits and comments on draft f1ecec5
1 parent f1ecec5 commit 0a10433

File tree

2 files changed

+68
-18
lines changed

2 files changed

+68
-18
lines changed

sections/01-introduction.qmd

Lines changed: 40 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ for the storing of certain data types, \emph{schemas} for databases that store
2525
a range of data types, \emph{ontologies} to describe and organize metadata in a
2626
manner that connects it to field-specific meaning, as well as mechanisms to
2727
describe \emph{provenance} of different data derivatives. The importance of
28-
standards was underscored in a recent report report by the Subcommittee on Open
28+
standards was underscored in a recent report by the Subcommittee on Open
2929
Science of the National Science and Technology Council on "Desirable
3030
characteristics of data repositories for federally funded research"
3131
\cite{nstc2022desirable}. The report explicitly called out the importance of
@@ -58,31 +58,46 @@ range of use-cases. Most standards are developed through an interplay between
5858
these two modes, and understanding how to make the best of these modes is
5959
critical in advancing the development of data and metadata standards.
6060

61+
<!--
62+
As an alternative for the paragraph above, maybe start with:
63+
1. "[Many] standards are developed through an interplay of modes"
64+
2. describe mode 1
65+
3. describe mode 2
66+
4. conclude the paragraph by something like "understanding how to make the
67+
best of these modes is critical in advancing the development of data and
68+
metadata standards."
69+
-->
70+
6171
One source of inspiration for bottom-up development of robust, adaptable and
6272
useful standards comes from open-source software (OSS). OSS has a long history
6373
going back to the development of the Unix operating system in the late 1960s.
64-
Over the time since its inception, the large community of developers and users
65-
of OSS have have developed a host of socio-technical mechanisms that support
74+
Since its inception, the large community of developers and users
75+
of OSS have developed a host of socio-technical mechanisms that support
6676
the development and use of OSS. For example, the Open Source Initiative (OSI),
67-
a non-profit organization that was founded in 1990s has evolved a set of
77+
a non-profit organization that was founded in the 1990s has evolved <!-- developed? --> a set of
6878
guidelines for licensing of OSS that is designed to protect the rights of
6979
developers and users. Technical tools to support the evolution of open-source
70-
software include software for distributed version control, such as the Git
71-
Source-code management system. When these social and technical innovations are
72-
put together they enable a host of positive defining features of OSS, such as
80+
software include software for distributed version control such as Git.
81+
When these social and technical innovations are
82+
put together, they enable a host of positive defining features of OSS, such as
7383
transparency, collaboration, and decentralization. These features allow OSS to
74-
have a remarkable level of dynamism and productivity, while also retaining the
75-
ability of a variety of stakeholders to guide the evolution of the software to
84+
have a remarkable level of dynamism and productivity, while also allowing
85+
a variety of stakeholders to guide the evolution of the software to
7686
take their needs and interests into account.
7787

7888
A necessary complement to these technical tools and legal instruments have been
7989
a host of practices that define the social interactions \emph{within}
8090
communities of OSS developers and users, and structures for governing these
8191
communities. While many OSS communities started as projects led by individual
8292
founders (so-called benevolent dictators for life, or BDFL; a title first
83-
bestowed on the originator of the Python programming language, Guido Van Rossum
93+
bestowed on the originator of the Python programming language, Guido van Rossum
8494
\cite{Van_Rossum2008BDFL}), recent years have led to an increased understanding
8595
that minimal standards of democratic governance are required in order for OSS
96+
<!--
97+
Perhaps predictability of how a project is governed is more important for
98+
projects to be successful than the fact that they are governed in a democratic
99+
way. E.g. Python was successul while Van Rossum was BDFL.
100+
-->
86101
communities to develop and flourish. This has led to the adoption of codes of
87102
conduct that govern the standards of behavior and communication among project
88103
stakeholders. It has also led to the establishment of democratically elected
@@ -95,10 +110,23 @@ the Python Enhancement Proposal (PEP) mechanism \cite{Warsaw2000PEP1}, which
95110
lays out how major changes to the software should be proposed, advocated for,
96111
and eventually decided on. While these tools, ideas, and practices evolved in
97112
developing software, they are readily translated to other domains. For example,
98-
OSS notions surrounding IP have given rise to the Creative Commons movement
113+
OSS notions surrounding intellectual property (IP) have given rise to the Creative Commons movement
99114
that has expanded these notions to apply to a much wider range of human
100-
creative endeavours. Similarly OSS notions regarding collaborative structures
115+
creative endeavours. Similarly, OSS notions regarding collaborative structures
101116
have pervaded the current era of open science and team science
102117
\cite{Baumgartner2023TeamScience, Koch2016TeamScience}.
103118

104119

120+
<!--
121+
The end of the penultimate paragraph felt a bit odd to me. While I don't have a
122+
good alternative at this time, the general structure of the last paragraph and
123+
a half is more or less this
124+
125+
Governance mechanisms that help communities develop and flourish are for example:
126+
1. codes of conduct
127+
2. steering councils
128+
3. enhancement proposals
129+
[4. contribution guidelines]
130+
These mechanism are readily translated to other domains, see for example
131+
licensing and collaborative structures.
132+
-->

sections/02-challenges.qmd

Lines changed: 28 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,28 @@ of a new standard [^1].
1010
[^1]: So old in fact that an oft-cited [XKCD comic](https://xkcd.com/927/) has
1111
been devoted to it.
1212

13+
<!--
14+
Not sure if it warrants its own section, but in my opinion, a more common
15+
reason for people not to adopt existing standards is the _perceived_ risk of
16+
adopting
17+
1. something you do not know (it may be difficult to figure out if a
18+
standard definitely solves my problem, and it may be expensive to figure out.
19+
Additionally, the persons having to make the decision may lack the skills to
20+
make an informed judgement);
21+
2. something you don't control (compromises made in the further development of
22+
a standard may make them unusable for my purposes);
23+
3. something whose development pace you don't control (open source can move
24+
very slowly, there is no guarantee that PRs will be merged or even
25+
appreciated, etc).
26+
27+
With all of these perceived downsides, making a new standard every time may be
28+
the expected outcome.
29+
30+
If we do want a paragraph along the lines of the above, we should try to collect
31+
ideas on what advise a standard of standards should give on promoting adoption
32+
of existing standards.
33+
-->
34+
1335
Another failure is the mismatch between developers of the standard and users.
1436
There is an inherent gap in both interest and ability to engage with the
1537
technical details undergirding standards and their development between the
@@ -22,17 +44,17 @@ about the practical implications of changes to the standards.
2244

2345
## Unclear pathways for standards success
2446

25-
Standards typically develop organically through sustained and persistent efforts from dedicated
26-
groups of data practitioneers. These include scientists and the broader ecosystem of data curators and users. However there is no playbook on the structure and components of a data standard, or the pathway that moves a data implementation to a data standard.
27-
As a result, data standardization lacks formal avenues for research grants.
47+
Standards typically develop organically through sustained and persistent efforts from dedicated
48+
groups of data practitioners. These include scientists and the broader ecosystem of data curators and users. However there is no playbook on the structure and components of a data standard, or the pathway that moves a data implementation to a data standard.
49+
As a result, data standardization lacks formal avenues for research grants.
2850

2951
## Cross domain funding gaps
3052

31-
Data standardization investment is justified if the standard is generalizable beyond any specific science domain. However while the use cases are domain sciences based, data standardization is seen as a data infrastrucutre and not a science investment. Moreover due to how science research funding works, scientists lack incentives to work across domains, or work on infrastructure problems.
53+
Data standardization investment is justified if the standard is generalizable beyond any specific science domain. However while the use cases are domain sciences based, data standardization is seen as a data infrastructure and not a science investment. Moreover due to how science research funding works, scientists lack incentives to work across domains, or work on infrastructure problems.
3254

33-
## Data instrumentation issues
55+
## Data instrumentation issues
3456

35-
Data for scientific observations are often generated by proprietary instrumentation due to commercialization or other profit driven incentives. There islack of regulatory oversight to adhere to available standards or evolve Significant data transformation is required to get data to a state that is amenable to standards, if available. If not available, there is lack of incentive to set aside investment or resources to invest in establishing data standards.
57+
Data for scientific observations are often generated by proprietary instrumentation due to commercialization or other profit driven incentives. There is lack of regulatory oversight to adhere to available standards or evolve Significant data transformation is required to get data to a state that is amenable to standards, if available. If not available, there is lack of incentive to set aside investment or resources to invest in establishing data standards.
3658

3759
## Sustainability
3860

0 commit comments

Comments
 (0)