Skip to content

Standardizing data for the EMLN package

Shai Pilosof edited this page Jul 23, 2023 · 2 revisions

Six main tables that form the basis for the development of the standard multilayer class:

  • Description
  • References
  • Interactions
  • Layers
  • Nodes
  • State_nodes

Each of these tables was a CSV file in the data set, and we describe each of them below. Because the data sets were different from each other, our general approach was to use attribute and value fields instead of the common wide format. Hence, instead of:

layer name latitude longitude
1 patch 1 50.4218 -101.046
2 patch 2 50.7 -101.2

We used:

layer attribute value
1 name patch 1
1 latitude 50.4218
1 longitude -101.046
2 name patch 2
2 latitude 50.7
2 longitude -101.2

This is because not all networks have these particular attributes. We applied the same approach to all the other tables.


Nodes.csv

Contains all information related to the physical nodes in the network and their attributes (if there are any).

This table will always contains the following fields:

node_id

A unique node ID is assigned to each physical node within the network.

node_name (*)

A generic name for the node, be it a specific code that was used in the original data, or the species/family/any taxa name.

(*)Not all node files contain this attribute, but it is present in most of them.

type

The node type attribute distinguishes between different types of nodes (e.g. pollinator node type).

Taxa verification

Taxa verification was performed using the package ‘Taxize’ and its classification function. The nodes' scientific names at different taxonomic levels were verified against the NCBI database.

Nodes that could not be found in the database due to misspellings or the absence of the taxa in the NCBI database were assigned a FALSE value in the taxa_verified attribute. In contrast, a TRUE value indicates that the taxon is present in the NCBI database, regardless of how many IDs are associated with it.

Taxa verification level

For verified taxa (i.e. taxa_verification == TRUE), this attribute indicates the level at which the verification was performed.

Nodes.csv example

node_id attribute value
1 node_name richardson_spermophile_(ground_squirrel)
1 type taxon
1 taxonomy_name Urocitellus richardsonii
1 taxonomy_rank species
1 taxa_verified TRUE
1 verification_level species
2 node_name coyote
2 type taxon
2 taxonomy_name Canis latrans
2 taxonomy_rank species
2 taxa_verified TRUE
2 verification_level species
3 node_name red-tailed_hawk
3 type taxon
3 taxonomy_name Buteo jamaicensis
3 taxonomy_rank species
3 taxa_verified TRUE
3 verification_level species

State_nodes.csv

When the same physical node has different attributes at different layers (e.g., abundance can change in time), the information will be stored in this table. The attributes included in this file vary depending on the network and most networks do not have this information. Attributes necessary to identify and differentiate between state nodes within the multilayer network are:

node_id

node_name

layer_id

State_nodes.csv example

layer_id node_id attribute value
1 1 node_name Microtus arvalis
1 1 sample_size 1345
2 1 node_name Microtus arvalis
2 1 sample_size 11
3 1 node_name Microtus arvalis
3 1 sample_size 150

Layers.csv

The file format contains comprehensive information about the layers that make up a multilayer network. There are certain layer attributes that you can always expect to find in the format. However, there may be additional attributes that are specific to certain types of networks.

layer

There is a unique layer ID assigned to each layer.

type

The layer can be categorized as one of the following types:

  • environment
  • time
  • space
  • perturbation
  • interaction

longitude and latitude

The longitude and latitude attributes provide data regarding the coordinates of the layers. In spatial networks, longitude and latitude features differentiate the layers by their respective coordinates.

location

The location of the study, including the country or region, is one of the attributes that may be present, particularly in spatial networks where it can vary between different layers.

name

The layer may have a generic name attribute that can be connected to the edge list instead of the layer ID.

directed

If a network consists of directed layers, then the directed attribute will be present and have a 'TRUE' value.

Layers.csv example

layer attribute value
1 type environment
1 name The biotic interactions of the prairie community
1 location Aspen Parkland, North America
1 latitude 50.4218
1 longitude -101.046
1 date 01/01/1928
1 directed TRUE
2 type environment
2 name The biotic interactions of the aspen community
2 location Aspen Parkland, North America
2 latitude 50.4218
2 longitude -101.046
2 date 01/01/1928
2 directed TRUE

Interactions.csv

The interactions between nodes of multilayer networks are represented by a commonly used extended edge list. This CSV file is organized in a long format where each interaction is assigned a unique ID and can contain additional attributes that are specific to the particular network.

The interactions list always contains the following attributes:

interaction_id

Each interaction is identified by a unique interaction ID.

node_from

Refers to the starting node of the edge.

layer_from

Represents the layer of the starting node.

node_to

Refers to the ending node of the edge.

layer_to

Represents the layer of the ending node.

type

There are different types of interaction:

  • frugivory
  • pollination
  • predation
  • herbivory
  • trophic
  • host-parasite
  • detritivore
  • scavenger
  • negative
  • positive
  • seed-dispersal
  • interlayer
  • competition
  • anemone-fish
  • plant-ant
  • parasitism

weight

The weight of the interaction.

The interactions file could contain other attributes for each interaction, such as 'method' - the method by which the interaction was measured.

Interactions.csv example

interaction_id attribute value
1 node_from red-tailed_hawk
1 layer_from 1
1 node_to vole_(microtus)
1 layer_to 1
1 weight 1
1 type predation
1 method field observation and gut content
2 node_from weasel
2 layer_from 1
2 node_to vole_(microtus)
2 layer_to 1
2 weight 1
2 type predation
2 method field observation and gut content

Description.csv

This file contains a general description of the network, which will aid the user in determining if this is the network they want to work with.

This file will always contain the following information:

Data Entry

The developer who collected the data.

Description

A brief description of the source article will be provided if the data was obtained from one.

Source

The source of the raw data:

  • Interaction Web DataBase
  • Web of Life
  • Web: from searching for relevant articles online
  • Mangal

Note: if the data is collected from Mangal, there will be a row specifying the mangal code used.

Data URL

URL or web address where the data was sourced from.

Ecological Network Type:

There are two different network attributes in the package. One is the ecological network type (e.g. Pollination network), while the other is the multilayer network type (e.g. Spatial network). For example, a network can be classified as a food Web with a temporal multilayer dimension.

Ecological network types:

  • Pollination: describes the interactions between plants and pollinators. The interactions in a pollination network are typically mutualistic. An example of a pollination interaction is the relationship between bees and flowers.
  • Seed-Dispersal: describes the movement of seeds from one location to another, typically through the action of an animal. An example of a seed-dispersal interaction is the relationship between a frugivore and a fruit-bearing plant.
  • Plant-Ant: describes the relationships between certain plant species and ant colonies, where both organisms benefit from the interaction in different ways.
  • Plant-Herbivore: describes the complex relationships between plants and the animals that consume them.
  • Host-Parasite: describes the relationships between hosts and parasites, in which the parasite relies on the host for survival and reproduction, while the host may suffer negative effects as a result of the parasite's presence.
  • Food-Web: describes the feeding relationships among species, and how energy and nutrients move from one organism to another.
  • Anemone-Fish: describes a symbiotic relationship between certain species of anemones and clownfish, in which both species benefit from the presence of the other.
  • Multiples: the network encompasses several ecological interactions throughout its layers (e.g. a network that contains host-parasite + food-web interactions).

Multilayer Network Types:

  • Spatial: the layers of a multilayer network are distinguished from each other based on their coordinates within the network.
  • Temporal: the layers of a multilayer network representing different time points of the network.
  • Perturbation: the layers are differentiated based on whether or not they are experiencing a perturbation (e.g. the effect of interventions like invasive species).
  • Environment: each layer represents a different aspect of the environment.
  • Multiplex: each layer represents a different type of interaction between the same set of nodes.

State Nodes

The presence or absence of the "state_nodes" CSV file determines whether or not there are attributes associated with state nodes in a particular network. A FALSE value is considered that there are no attributes associated with the state nodes, while a TRUE value points out that there are attributes associated with state nodes.

Description.csv example

attribute value
data_entry ofir_segev
multilayer_network_type Environment
description NA
source mangal
data_url https://mangal.io/doc/api/
mangal_code 87
ecological_network_type Food-Web
state_nodes FALSE

References.csv

The file contains all the relevant information required to find the article where the data is presented, including the article's doi, author, and year of the article (for data that was extracted from the article). May also include paper/data URLs. There are multiple rows in the file when data is taken from multiple articles.

doi

The doi code of the article that the data was taken from. If the article doesn't have a DOI, the value will be represented as NA.

Author

The name/s of the author/s of the article the data was sourced from.

Year

The year of data publication.

Paper URL

The website address provides access to the article.

Data URL

The website where the data is hosted.

References.csv example

doi author year paper_url
10.2307/1948658 ralph w. dexter 1947-01-01 https://doi.org/10.2307%2F1948658