You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: sections/02-use-cases.qmd
+59-16Lines changed: 59 additions & 16 deletions
Original file line number
Diff line number
Diff line change
@@ -20,17 +20,34 @@ Image Transport System) file format standard, which was developed in the late
20
20
astronomy data preservation and exchange. Essentially every software platform
21
21
used in astronomy reads and writes the FITS format. It was developed by
22
22
observatories in the 1980s to store image data in the visible and x-ray
23
-
spectrum. It has been endorsed by IAU, as well as funding agencies. Though the
24
-
format has evolved over time, “once FITS, always FITS”. That is, the format
25
-
cannot be evolved to introduce changes that break backward compatibility.
26
-
Among the features that make FITS so durable is that it was designed originally
27
-
to have a very restricted metadata schema. That is, FITS records were designed
28
-
to be the lowest common denominator of word lengths in computer systems at the
29
-
time. However, while FITS is compact, its ability to encode the coordinate
30
-
frame and pixels, means that data from different observational instruments can
31
-
be stored in this format and relationships between data from different
32
-
instruments can be related, rendering manual and error-prone procedures for
33
-
conforming images obsolete.
23
+
spectrum. It has been endorsed by the International Astronomical Union (IAU),
24
+
as well as funding agencies. Though the format has evolved over time, “once
25
+
FITS, always FITS”. That is, the format cannot be evolved to introduce changes
26
+
that break backward compatibility. Among the features that make FITS so durable
27
+
is that it was designed originally to have a very restricted metadata schema.
28
+
That is, FITS records were designed to be the lowest common denominator of word
29
+
lengths in computer systems at the time. However, while FITS is compact, its
30
+
ability to encode the coordinate frame and pixels, means that data from
31
+
different observational instruments can be stored in this format and
32
+
relationships between data from different instruments can be related, rendering
33
+
manual and error-prone procedures for conforming images obsolete. Nevertheless,
34
+
the stability has also raised some issues as the field continues to adapt to
35
+
new measurement methods and the demands of ever-increasing data volumes and
36
+
complex data analysis use-case, such as interchange with other data and the use
37
+
of complex data bases to store and share data [@Scroggins2020-ut]. Another
38
+
prominent example of the use of open-source processes to develop standards in
39
+
Astronomy is in the tools and protocols developed by the International Virtual
40
+
Observatory Alliance (IVOA) and its national implementations, e.g., in the US
41
+
Virtual Astronomical Observatory[@Hanisch2015-cu]. The virtual observatories
42
+
facilitate discovery and access across observatories around the world and
43
+
underpin data discovery in astronomy. The IVOA took inspiration from the
44
+
World-Wide Web Consortium (W3C) and adopted its process for the development of
45
+
its standards (i.e., Working drafts $\rightarrow$ Proposed Recommendations
46
+
$\rightarrow$ Recommendations), with individual standards developed by
47
+
inter-institutional and international working groups. One of the outcomes of
48
+
the coordination effort is the development of an ecosystem of software tools
49
+
both developed within the observatory teams and within the user community that
50
+
interoperate with the standards that were adopted by the observatories.
34
51
35
52
## High-energy physics (HEP)
36
53
@@ -47,13 +64,38 @@ data is shared (i.e., in a standards-compliant manner).
47
64
48
65
## Earth sciences
49
66
50
-
The need for geospatial data exchange between different systems began to be recognized in the 1970s and 1980s, but proprietary formats still dominated. Coordinated standardization efforts brought the Open Geospatial Consortium (OGC) establishment in the 1990s, a critical step towards open standards for geospatial data. The 1990s have also seen the development of key standards such as the Network Common Data Form (NetCDF) developed by the University Corporation for Atmospheric Research (UCAR) and the Hierarchical Data Format (HDF), a set of file formats (HDF4, HDF5) that are widely used, particularly in climate research. The GeoTIFF format, which originated at NASA in the late 1990s, is extensively used to share image data. In the 1990s, open web mapping also began with MapServer (https://mapserver.org) and continued later with other projects such as OpenStreetMap (www.openstreetmap.org). The following two decades, the 2000s-2020s, brought an expansion of open standards and integration with web technologies developed by OGC, as well as other standards such as the Keyhole Markup Language (KML) for displaying geographic data in Earth browsers. Formats suitable for cloud computing also emerged, such as the Cloud Optimized GeoTIFF (COG), followed by Zarr and Apache Parquet for array and tabular data, respectively. In 2006, the Open Source Geospatial Foundation (OSGeo, https://www.osgeo.org) was established, demonstrating the community's commitment to the development of open-source geospatial technologies. While some standards have been developed in the industry (e.g., Keyhole Markup Language (KML) by Keyhole Inc., which Google later acquired), they later became international standards of the OGC, which now encompasses more than 450 commercial, governmental, nonprofit, and research organizations working together on the development and implementation of open standards (https://www.ogc.org).
67
+
The need for geospatial data exchange between different systems began to be
68
+
recognized in the 1970s and 1980s, but proprietary formats still dominated.
69
+
Coordinated standardization efforts brought the Open Geospatial Consortium
70
+
(OGC) establishment in the 1990s, a critical step towards open standards for
71
+
geospatial data. The 1990s have also seen the development of key standards such
72
+
as the Network Common Data Form (NetCDF) developed by the University
73
+
Corporation for Atmospheric Research (UCAR), and the Hierarchical Data Format
74
+
(HDF), a set of file formats (HDF4, HDF5) that are widely used, particularly in
75
+
climate research. The GeoTIFF format, which originated at NASA in the late
76
+
1990s, is extensively used to share image data. In the 1990s, open web mapping
77
+
also began with MapServer (https://mapserver.org) and continued later with
78
+
other projects such as OpenStreetMap (https://www.openstreetmap.org). The
79
+
following two decades, the 2000s-2020s, brought an expansion of open standards
80
+
and integration with web technologies developed by OGC, as well as other
81
+
standards such as the Keyhole Markup Language (KML) for displaying geographic
82
+
data in Earth browsers. Formats suitable for cloud computing also emerged, such
83
+
as the Cloud Optimized GeoTIFF (COG), followed by Zarr and Apache Parquet for
84
+
array and tabular data, respectively. In 2006, the Open Source Geospatial
85
+
Foundation (OSGeo, https://www.osgeo.org) was established, demonstrating the
86
+
community's commitment to the development of open-source geospatial
87
+
technologies. While some standards have been developed in the industry (e.g.,
88
+
Keyhole Markup Language (KML) by Keyhole Inc., which Google later acquired),
89
+
they later became international standards of the OGC, which now encompasses
90
+
more than 450 commercial, governmental, nonprofit, and research organizations
91
+
working together on the development and implementation of open standards
92
+
(https://www.ogc.org).
51
93
52
94
## Neuroscience
53
95
54
-
In contrast to astronomy and HEP, Neuroscience has traditionally been a
55
-
"cottage industry", where individual labs have generated experimental data
56
-
designed to answer specific experimental questions. While this model still
96
+
In contrast to the previously-mentioned fields, Neuroscience has traditionally
97
+
been a "cottage industry", where individual labs have generated experimental
98
+
data designed to answer specific experimental questions. While this model still
57
99
exists, the field has also seen the emergence of new modes of data production
58
100
that focus on generating large shared datasets designed to answer many
59
101
different questions, more akin to the data generated in large astronomy data
@@ -72,7 +114,7 @@ success to the adoption of OSS development mechanisms [@Poldrack2024BIDS]. For
72
114
example, small changes to the standard are managed through the GitHub pull
73
115
request mechanism; larger changes are managed through a BIDS Enhancement
74
116
Proposal (BEP) process that is directly inspired by the Python programming
75
-
language community's Python Enhancement Proposal procedure, which isused to
117
+
language community's Python Enhancement Proposal procedure, which is used to
76
118
introduce new ideas into the language. Though the BEP mechanism takes a
77
119
slightly different technical approach, it tries to emulate the open-ended and
78
120
community-driven aspects of Python development to accept contributions from a
@@ -102,3 +144,4 @@ if the standard is developed using git/GitHub for versioning, this would
102
144
require learning the complex and obscure technical aspects of these system that
103
145
are far from easy to adopt, even for many professional scientists.
0 commit comments