|
1 |
| -# Opportunities and risks for open-source standards {#sec-opportunities} |
| 1 | +# Opportunities and risks for open-source standards {#sec-challenges} |
2 | 2 |
|
3 | 3 | At the same time, these tools and practices are associated with risks that need
|
4 | 4 | to be mitigated.
|
5 | 5 |
|
6 |
| -## Flexibility vs. stability |
| 6 | +## Flexibility vs. Stability |
7 | 7 |
|
8 | 8 | One of the defining characteristics of OSS is its dynamism and its rapid
|
9 | 9 | evolution. Because OSS can be used by anyone and, in most cases, contributions
|
@@ -59,27 +59,60 @@ standardization lacks formal avenues for success and recognition, for example th
|
59 | 59 | Data standardization investment is justified if the standard is generalizable
|
60 | 60 | beyond any specific science domain. However while the use cases are domain
|
61 | 61 | sciences based, data standardization is seen as a data infrastructure and not a
|
62 |
| -science investment. Moreover due to how science research funding works, |
63 |
| -scientists lack incentives to work across domains, or work on infrastructure |
| 62 | +science investment. Moreover, due to how science research funding works, |
| 63 | +scientists lack incentives to work across domains or to work on infrastructure |
64 | 64 | problems.
|
65 | 65 |
|
66 | 66 | ## Data instrumentation issues
|
67 | 67 |
|
68 | 68 | Data for scientific observations are often generated by proprietary
|
69 |
| -instrumentation due to commercialization or other profit driven incentives. |
70 |
| -There islack of regulatory oversight to adhere to available standards or evolve |
71 |
| -Significant data transformation is required to get data to a state that is |
72 |
| -amenable to standards, if available. If not available, there is lack of |
| 69 | +instrumentation due to commercialization or other profit-driven incentives. |
| 70 | +There is a lack of regulatory oversight to adhere to available standards or |
| 71 | +evolve Significant data transformation is required to get data to a state that |
| 72 | +is amenable to standards, if available. If not available, there is a lack of |
73 | 73 | incentive to set aside investment or resources to invest in establishing data
|
74 | 74 | standards.
|
75 | 75 |
|
| 76 | +### Harnessing new computing paradigms and technologies |
| 77 | + |
| 78 | +Open-source standards development faces the challenges of adapting to new |
| 79 | +computing paradigms and technologies. Cloud computing provides a particularly |
| 80 | +stark set of opportunities and challenges. On the one hand, cloud computing |
| 81 | +offers practical solutions for many challenges of contemporary data-driven |
| 82 | +research. For example, the scalability of cloud resources addresses some of the |
| 83 | +challenges of the scale of data that is produced by instruments in many fields. |
| 84 | +The cloud also makes data access relatively straightforward, because of the |
| 85 | +ability to determine data access permissions in a granular fashion. On the |
| 86 | +other hand, cloud computing requires reinstrumenting many data formats. This is |
| 87 | +because cloud data access patterns are fundamentally different from the ones |
| 88 | +that are used in local posix-style file-systems. Suspicion of cloud computing |
| 89 | +comes in two different flavors: the first by researchers and administrators who |
| 90 | +may be wary of costs associated with cloud computing, and especially with the |
| 91 | +difficulty of predicting these costs. Projects such as NSF's Cloud Bank seek to |
| 92 | +mitigate some of these concerns, by providing an additional layer of |
| 93 | +transparency into cloud costs [@Norman2021CloudBank]. The other type of |
| 94 | +objection relates to the fact that cloud computing services, by their very |
| 95 | +nature, are closed ecosystems that resist portability and interoperability. |
| 96 | +Some aspects of the services are always going to remain hidden and privy only |
| 97 | +to the cloud computing service provider. In this respect, cloud computing runs |
| 98 | +afoul of some of the appealing aspects of OSS. That said, the development of |
| 99 | +"cloud native" standards can provide significant benefits in terms of the |
| 100 | +research that can be conducted. For example, NOAA plans to use cloud computing |
| 101 | +for integration across the multiple disparate datasets that it collects to |
| 102 | +build knowledge graphs that can be queried by researchers to answer questions |
| 103 | +that can only be answered through this integration. Putting all the data "in |
| 104 | +one place" should help with that. Adaptation to the cloud in terms of data |
| 105 | +standards has driven development of new file formats. A salient example is the |
| 106 | +ZARR format [@zarr], which supports random access into array-based datasets |
| 107 | +stored in cloud object storage, facilitating scalable and parallelized |
| 108 | +computing on these data. Indeed, data standards such as NWB (neuroscience) and |
| 109 | +OME (microscopy) now use ZARR as a backend for cloud-based storage. In other |
| 110 | +cases, file formats that were once not straightforward to use in the cloud, |
| 111 | +such as HDF5 and TIFF have been adapted to cloud use (e.g., through the |
| 112 | +cloud-optimized geoTIFF format). |
| 113 | + |
76 | 114 | ## Sustainability
|
77 | 115 |
|
78 | 116 | ## The importance of automated validation
|
79 | 117 |
|
80 |
| -## Harnessing new computing paradigms and technologies |
81 |
| - |
82 |
| -Open-source standards development faces the challenges of adapting to new |
83 |
| -technologies The development of standards that are well-Cloud computing |
84 |
| -provides |
85 | 118 |
|
0 commit comments