Skip to content

Commit 4b54902

Browse files
committed
More stuff.
1 parent aefc5ed commit 4b54902

File tree

4 files changed

+118
-76
lines changed

4 files changed

+118
-76
lines changed

references.bib

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,39 @@
1+
2+
@ARTICLE{Rubel2022NWB,
3+
title = "The Neurodata Without Borders ecosystem for neurophysiological
4+
data science",
5+
author = "R{\"u}bel, Oliver and Tritt, Andrew and Ly, Ryan and Dichter,
6+
Benjamin K and Ghosh, Satrajit and Niu, Lawrence and Baker,
7+
Pamela and Soltesz, Ivan and Ng, Lydia and Svoboda, Karel and
8+
Frank, Loren and Bouchard, Kristofer E",
9+
abstract = "The neurophysiology of cells and tissues are monitored
10+
electrophysiologically and optically in diverse experiments and
11+
species, ranging from flies to humans. Understanding the brain
12+
requires integration of data across this diversity, and thus
13+
these data must be findable, accessible, interoperable, and
14+
reusable (FAIR). This requires a standard language for data and
15+
metadata that can coevolve with neuroscience. We describe design
16+
and implementation principles for a language for neurophysiology
17+
data. Our open-source software (Neurodata Without Borders, NWB)
18+
defines and modularizes the interdependent, yet separable,
19+
components of a data language. We demonstrate NWB's impact
20+
through unified description of neurophysiology data across
21+
diverse modalities and species. NWB exists in an ecosystem, which
22+
includes data management, analysis, visualization, and archive
23+
tools. Thus, the NWB data language enables reproduction,
24+
interchange, and reuse of diverse neurophysiology data. More
25+
broadly, the design principles of NWB are generally applicable to
26+
enhance discovery across biology through data FAIRness.",
27+
journal = "Elife",
28+
volume = 11,
29+
month = oct,
30+
year = 2022,
31+
keywords = "FAIR data; Neurophysiology; archive; data ecosystem; data
32+
language; data standard; human; mouse; neuroscience; rat",
33+
language = "en"
34+
}
35+
36+
137
@ARTICLE{Gorgolewski2016BIDS,
238
title = "The {Brain} {Imaging} {Data} {Structure}, a format for organizing and
339
describing outputs of neuroimaging experiments",

sections/01-introduction.qmd

Lines changed: 42 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ machine learning techniques, these datasets can help us understand everything
99
from the cellular operations of the human body, through business transactions
1010
on the internet, to the structure and history of the universe. However, the
1111
development of new machine learning methods, and data-intensive discovery more
12-
generally, rely heavily on Findability, Accessibility, Interoperability and
12+
generally depend on Findability, Accessibility, Interoperability and
1313
Reusability (FAIR) of data [@Wilkinson2016FAIR].
1414

1515
One of the main mechanisms through which the FAIR principles are promoted is the
@@ -24,79 +24,51 @@ The importance of standards stems not only from discussions within research
2424
fields about how research can best be conducted to take advantage of existing
2525
and growing datasets, but also arises from an ongoing series of policy
2626
discussions that address the interactions between research communities and the
27-
general public. In the United States, memos issued in 2013 and 2022 by the
28-
directors of the White House Office of Science and Technology Policy (OSTP),
29-
James Holdren (2013) and Alondra Nelson (2022). While these memos focused
30-
primarily on making peer-reviewed publications funded by the US Federal
31-
government available to the general public, they also lay an increasingly
32-
detailed path towards the publication and general availability of the data that
33-
is collected as part of the research that is funded by the US government.
27+
general public. In the United States, these policies are expressed, for example
28+
in memos issued by the directors of the White House Office of Science and
29+
Technology Policy (OSTP), James Holdren (in 2013) and Alondra Nelson (in 2022).
30+
While these memos focused primarily on making peer-reviewed publications funded
31+
by the US Federal government available to the general public, they also lay an
32+
increasingly detailed path toward the publication and general availability of
33+
the data that is collected in research that is funded by the US government. The
34+
general guidance and overall spirit of these memos dovetail with more specific
35+
policy guidance related to data and metadata standards. The importance of
36+
standards was underscored in a recent report by the Subcommittee on Open
37+
Science of the National Science and Technology Council on the "Desirable
38+
characteristics of data repositories for federally funded research"
39+
[@nstc2022desirable]. The report explicitly called out the importance of
40+
"allow[ing] datasets and metadata to be accessed, downloaded, or exported from
41+
the repository in widely used, preferably non-proprietary, formats consistent
42+
with standards used in the disciplines the repository serves." This highlights
43+
the need for data and metadata standards across a variety of different kinds of
44+
data. In addition, a report from the National Institute of Standards and
45+
Technology on "U.S. Leadership in AI: A Plan for Federal Engagement in
46+
Developing Technical Standards and Related Tools" emphasized that --
47+
specifically for the case of AI -- "U.S. government agencies should prioritize
48+
AI standards efforts that are [...] Consensus-based, [...] Inclusive and
49+
accessible, [...] Multi-path, [...] Open and transparent, [...] and [that]
50+
Result in globally relevant and non-discriminatory standards..." [@NIST2019].
51+
The converging characteristics of standards that arise from these reports
52+
suggest that considerable thought needs to be given to how standards arise, so
53+
that these goals are achieved.
3454

35-
The general guidance and overall spirit of these memos dovetail with more
36-
specific policy discussions that put meat on the bones of the general guidance.
37-
The importance of data and metadata standards, for example, was underscored in
38-
a recent report by the Subcommittee on Open Science of the National Science and
39-
Technology Council on the "Desirable characteristics of data repositories for
40-
federally funded research" [@nstc2022desirable]. The report explicitly called
41-
out the importance of "allow[ing] datasets and metadata to be accessed,
42-
downloaded, or exported from the repository in widely used, preferably
43-
non-proprietary, formats consistent with standards used in the disciplines the
44-
repository serves." This highlights the need for data and metadata standards
45-
across a variety of different kinds of data. In addition, a report from the
46-
National Institute of Standards and Technology on "U.S. Leadership in AI: A
47-
Plan for Federal Engagement in Developing Technical Standards and Related
48-
Tools" emphasized that -- specifically for the case of AI -- "U.S. government
49-
agencies should prioritize AI standards efforts that are [...] Consensus-based,
50-
[...] Inclusive and accessible, [...] Multi-path, [...] Open and transparent,
51-
[...] and [that] Result in globally relevant and non-discriminatory
52-
standards..." [@NIST2019]. The converging characteristics of standards that
53-
arise from these reports suggest that considerable thought needs to be given to
54-
the manner in which standards arise, so that these goals are achieved.
55-
56-
Standards for a specific domain can come about in various ways. Broadly
57-
speaking two kinds of mechanisms can generate a standard for a specific type of
58-
data: (i) top-down: in this case a (usually) small group of people develop the
59-
standard and disseminate it to the communities of interest with very little
60-
input from these communities. An example of this mode of standards development
61-
can occur when an instrument is developed by a manufacturer and users of this
62-
instrument receive the data in a particular format that was developed in tandem
63-
with the instrument; and (ii) bottom-up: in this case, standards are developed
64-
by a larger group of people that convene and reach consensus about the details
65-
of the standard in an attempt to cover a large range of use-cases. Most
66-
standards are developed through an interplay between these two modes, and
67-
understanding how to make the best of these modes is critical in advancing the
68-
development of data and metadata standards.
69-
70-
One source of inspiration for bottom-up development of robust, adaptable and
71-
useful standards comes from open-source software (OSS). OSS has a long history
72-
going back to the development of the Unix operating system in the late 1960s.
73-
Over the time since its inception, the large community of developers and users
74-
of OSS have have developed a host of socio-technical mechanisms that support
55+
One source of inspiration for community-driven development of robust, adaptable
56+
and useful standards comes from open-source software (OSS). OSS has a long
57+
history going back to the development of the Unix operating system in the late
58+
1960s. Over the time since its inception, the large community of developers and
59+
users of OSS have developed a host of socio-technical mechanisms that support
7560
the development and use of OSS. For example, the Open Source Initiative (OSI),
76-
a non-profit organization that was founded in 1990s has evolved a set of
61+
a non-profit organization that was founded in 1990s developed a set of
7762
guidelines for licensing of OSS that is designed to protect the rights of
78-
developers and users. Technical tools to support the evolution of open-source
79-
software include software for distributed version control, such as the Git
80-
Source-code management system. When these social and technical innovations are
81-
put together they enable a host of positive defining features of OSS, such as
82-
transparency, collaboration, and decentralization. These features allow OSS to
83-
have a remarkable level of dynamism and productivity, while also retaining the
84-
ability of a variety of stakeholders to guide the evolution of the software to
85-
take their needs and interests into account.
63+
developers and users. On the more technical side, tools such as the Git
64+
Source-code management system also support open-source development workflows.
65+
When these social and technical innovations are put together they enable a host
66+
of positive defining features of OSS, such as transparency, collaboration, and
67+
decentralization. These features allow OSS to have a remarkable level of
68+
dynamism and productivity, while also retaining the ability of a variety of
69+
stakeholders to guide the evolution of the software to take their needs and
70+
interests into account.
8671

87-
A necessary complement to these technical tools and legal instruments have been
88-
a host of practices that define the social interactions \emph{within}
89-
communities of OSS developers and users, and structures for governing these
90-
communities. While many OSS communities started as projects led by individual
91-
founders (so-called benevolent dictators for life, or BDFL; a title first
92-
bestowed on the originator of the Python programming language, Guido Van Rossum
93-
\cite{Van_Rossum2008BDFL}), recent years have led to an increased understanding
94-
that minimal standards of democratic governance are required in order for OSS
95-
communities to develop and flourish. This has led to the adoption of codes of
96-
conduct that govern the standards of behavior and communication among project
97-
stakeholders. It has also led to the establishment of democratically elected
98-
steering councils/committees from among the members and stakeholders of an OSS
99-
project's community.
10072

10173
It was also within the Python community that an orderly process for
10274
community-guided evolution of an open-source software project emerged, through

sections/02-challenges.qmd

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -22,17 +22,17 @@ about the practical implications of changes to the standards.
2222

2323
## Unclear pathways for standards success
2424

25-
Standards typically develop organically through sustained and persistent efforts from dedicated
26-
groups of data practitioneers. These include scientists and the broader ecosystem of data curators and users. However there is no playbook on the structure and components of a data standard, or the pathway that moves a data implementation to a data standard.
27-
As a result, data standardization lacks formal avenues for research grants.
25+
Standards typically develop organically through sustained and persistent efforts from dedicated
26+
groups of data practitioneers. These include scientists and the broader ecosystem of data curators and users. However there is no playbook on the structure and components of a data standard, or the pathway that moves a data implementation to a data standard.
27+
As a result, data standardization lacks formal avenues for research grants.
2828

2929
## Cross domain funding gaps
3030

31-
Data standardization investment is justified if the standard is generalizable beyond any specific science domain. However while the use cases are domain sciences based, data standardization is seen as a data infrastrucutre and not a science investment. Moreover due to how science research funding works, scientists lack incentives to work across domains, or work on infrastructure problems.
31+
Data standardization investment is justified if the standard is generalizable beyond any specific science domain. However while the use cases are domain sciences based, data standardization is seen as a data infrastrucutre and not a science investment. Moreover due to how science research funding works, scientists lack incentives to work across domains, or work on infrastructure problems.
3232

33-
## Data instrumentation issues
33+
## Data instrumentation issues
3434

35-
Data for scientific observations are often generated by proprietary instrumentation due to commercialization or other profit driven incentives. There islack of regulatory oversight to adhere to available standards or evolve Significant data transformation is required to get data to a state that is amenable to standards, if available. If not available, there is lack of incentive to set aside investment or resources to invest in establishing data standards.
35+
Data for scientific observations are often generated by proprietary instrumentation due to commercialization or other profit driven incentives. There islack of regulatory oversight to adhere to available standards or evolve Significant data transformation is required to get data to a state that is amenable to standards, if available. If not available, there is lack of incentive to set aside investment or resources to invest in establishing data standards.
3636

3737
## Sustainability
3838

sections/xx-use-cases.qmd

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# Use cases
2+
3+
Meanwhile, the importance of standards is also increasingly understood in
4+
research communities that are learning about the value of shared data
5+
resources. While some fields, such as astronomy, high-energy physics and earth
6+
sciences have a relatively long history of shared data resources from
7+
organizations such as LSST and CERN, other fields have only relatively recently
8+
become aware of the value of data sharing and its impact.
9+
10+
For example, neuroscience has traditionally been a "cottage industry", where
11+
individual labs have generated experimental data designed to answer specific
12+
experimental questions. While this model still exists, the field has also seen
13+
the emergence of new modes of data production that focus on generating large
14+
shared datasets designed to answer many different questions, more akin to the
15+
data generated in large astronomy data collection efforts. This change has been
16+
brought on through a combination of technical advances in data acquisition
17+
techniques, which now generate large and very high-dimensional/information-rich
18+
datasets, cultural changes, which have ushered in new norms of transparency and
19+
reproducibility (related to the policy discussions mentioned above), and
20+
funding initiatives that have encouraged this kind of data collection
21+
(including the US BRAIN Initiative and the Allen Institute for Brain Science).
22+
Neuroscience presents an interesting example, because in response to these new
23+
data resources, the field has had to establish new standards for data and
24+
metadata that facilitate sharing and using of these data. Two examples are the
25+
Neurodata Without Borders file format for neurophysiology data [@Rubel2022NWB]
26+
and the Brain Imaging Data Structure standard for neuroimaging data
27+
[@Gorgolewski2016BIDS].
28+
29+
30+
31+
## Automated discovery
32+
33+
## Citizen science
34+

0 commit comments

Comments
 (0)