Standardizing data for the EMLN package

Six main tables that form the basis for the development of the standard multilayer class:

Description
References
Interactions
Layers
Nodes
State_nodes

Each of these tables was a CSV file in the data set, and we describe each of them below. Because the data sets were different from each other, our general approach was to use attribute and value fields instead of the common wide format. Hence, instead of:

layer	name	latitude	longitude
1	patch 1	50.4218	-101.046
2	patch 2	50.7	-101.2

We used:

layer	attribute	value
1	name	patch 1
1	latitude	50.4218
1	longitude	-101.046
2	name	patch 2
2	latitude	50.7
2	longitude	-101.2

This is because not all networks have these particular attributes. We applied the same approach to all the other tables.

Nodes.csv

Contains all information related to the physical nodes in the network and their attributes (if there are any).

This table will always contains the following fields:

node_id

A unique node ID is assigned to each physical node within the network.

node_name (*)

A generic name for the node, be it a specific code that was used in the original data, or the species/family/any taxa name.

(*)Not all node files contain this attribute, but it is present in most of them.

type

The node type attribute distinguishes between different types of nodes (e.g. pollinator node type).

Taxa verification

Taxa verification was performed using the package ‘Taxize’ and its classification function. The nodes' scientific names at different taxonomic levels were verified against the NCBI database.

Nodes that could not be found in the database due to misspellings or the absence of the taxa in the NCBI database were assigned a FALSE value in the taxa_verified attribute. In contrast, a TRUE value indicates that the taxon is present in the NCBI database, regardless of how many IDs are associated with it.

Taxa verification level

For verified taxa (i.e. taxa_verification == TRUE), this attribute indicates the level at which the verification was performed.

Nodes.csv example

node_id	attribute	value
1	node_name	richardson_spermophile_(ground_squirrel)
1	type	taxon
1	taxonomy_name	Urocitellus richardsonii
1	taxonomy_rank	species
1	taxa_verified	TRUE
1	verification_level	species
2	node_name	coyote
2	type	taxon
2	taxonomy_name	Canis latrans
2	taxonomy_rank	species
2	taxa_verified	TRUE
2	verification_level	species
3	node_name	red-tailed_hawk
3	type	taxon
3	taxonomy_name	Buteo jamaicensis
3	taxonomy_rank	species
3	taxa_verified	TRUE
3	verification_level	species

State_nodes.csv

When the same physical node has different attributes at different layers (e.g., abundance can change in time), the information will be stored in this table. The attributes included in this file vary depending on the network and most networks do not have this information. Attributes necessary to identify and differentiate between state nodes within the multilayer network are:

node_id

node_name

layer_id

State_nodes.csv example

layer_id	node_id	attribute	value
1	1	node_name	Microtus arvalis
1	1	sample_size	1345
2	1	node_name	Microtus arvalis
2	1	sample_size	11
3	1	node_name	Microtus arvalis
3	1	sample_size	150

Layers.csv

The file format contains comprehensive information about the layers that make up a multilayer network. There are certain layer attributes that you can always expect to find in the format. However, there may be additional attributes that are specific to certain types of networks.

layer

There is a unique layer ID assigned to each layer.

type

The layer can be categorized as one of the following types:

environment
time
space
perturbation
interaction

longitude and latitude

The longitude and latitude attributes provide data regarding the coordinates of the layers. In spatial networks, longitude and latitude features differentiate the layers by their respective coordinates.

location

The location of the study, including the country or region, is one of the attributes that may be present, particularly in spatial networks where it can vary between different layers.

name

The layer may have a generic name attribute that can be connected to the edge list instead of the layer ID.

directed

If a network consists of directed layers, then the directed attribute will be present and have a 'TRUE' value.

Layers.csv example

layer	attribute	value
1	type	environment
1	name	The biotic interactions of the prairie community
1	location	Aspen Parkland, North America
1	latitude	50.4218
1	longitude	-101.046
1	date	01/01/1928
1	directed	TRUE
2	type	environment
2	name	The biotic interactions of the aspen community
2	location	Aspen Parkland, North America
2	latitude	50.4218
2	longitude	-101.046
2	date	01/01/1928
2	directed	TRUE

Interactions.csv

The interactions between nodes of multilayer networks are represented by a commonly used extended edge list. This CSV file is organized in a long format where each interaction is assigned a unique ID and can contain additional attributes that are specific to the particular network.

The interactions list always contains the following attributes:

interaction_id

Each interaction is identified by a unique interaction ID.

node_from

Refers to the starting node of the edge.

layer_from

Represents the layer of the starting node.

node_to

Refers to the ending node of the edge.

layer_to

Represents the layer of the ending node.

type

There are different types of interaction:

frugivory
pollination
predation
herbivory
trophic
host-parasite
detritivore
scavenger
negative
positive
seed-dispersal
interlayer
competition
anemone-fish
plant-ant
parasitism

weight

The weight of the interaction.

The interactions file could contain other attributes for each interaction, such as 'method' - the method by which the interaction was measured.

Interactions.csv example

interaction_id	attribute	value
1	node_from	red-tailed_hawk
1	layer_from	1
1	node_to	vole_(microtus)
1	layer_to	1
1	weight	1
1	type	predation
1	method	field observation and gut content
2	node_from	weasel
2	layer_from	1
2	node_to	vole_(microtus)
2	layer_to	1
2	weight	1
2	type	predation
2	method	field observation and gut content

Description.csv

This file contains a general description of the network, which will aid the user in determining if this is the network they want to work with.

This file will always contain the following information:

Data Entry

The developer who collected the data.

Description

A brief description of the source article will be provided if the data was obtained from one.

Source

The source of the raw data:

Interaction Web DataBase
Web of Life
Web: from searching for relevant articles online
Mangal

Note: if the data is collected from Mangal, there will be a row specifying the mangal code used.

Data URL

URL or web address where the data was sourced from.

Ecological Network Type:

There are two different network attributes in the package. One is the ecological network type (e.g. Pollination network), while the other is the multilayer network type (e.g. Spatial network). For example, a network can be classified as a food Web with a temporal multilayer dimension.

Ecological network types:

Pollination: describes the interactions between plants and pollinators. The interactions in a pollination network are typically mutualistic. An example of a pollination interaction is the relationship between bees and flowers.
Seed-Dispersal: describes the movement of seeds from one location to another, typically through the action of an animal. An example of a seed-dispersal interaction is the relationship between a frugivore and a fruit-bearing plant.
Plant-Ant: describes the relationships between certain plant species and ant colonies, where both organisms benefit from the interaction in different ways.
Plant-Herbivore: describes the complex relationships between plants and the animals that consume them.
Host-Parasite: describes the relationships between hosts and parasites, in which the parasite relies on the host for survival and reproduction, while the host may suffer negative effects as a result of the parasite's presence.
Food-Web: describes the feeding relationships among species, and how energy and nutrients move from one organism to another.
Anemone-Fish: describes a symbiotic relationship between certain species of anemones and clownfish, in which both species benefit from the presence of the other.
Multiples: the network encompasses several ecological interactions throughout its layers (e.g. a network that contains host-parasite + food-web interactions).

Multilayer Network Types:

Spatial: the layers of a multilayer network are distinguished from each other based on their coordinates within the network.
Temporal: the layers of a multilayer network representing different time points of the network.
Perturbation: the layers are differentiated based on whether or not they are experiencing a perturbation (e.g. the effect of interventions like invasive species).
Environment: each layer represents a different aspect of the environment.
Multiplex: each layer represents a different type of interaction between the same set of nodes.

State Nodes

The presence or absence of the "state_nodes" CSV file determines whether or not there are attributes associated with state nodes in a particular network. A FALSE value is considered that there are no attributes associated with the state nodes, while a TRUE value points out that there are attributes associated with state nodes.

Description.csv example

attribute	value
data_entry	ofir_segev
multilayer_network_type	Environment
description	NA
source	mangal
data_url	https://mangal.io/doc/api/
mangal_code	87
ecological_network_type	Food-Web
state_nodes	FALSE

References.csv

The file contains all the relevant information required to find the article where the data is presented, including the article's doi, author, and year of the article (for data that was extracted from the article). May also include paper/data URLs. There are multiple rows in the file when data is taken from multiple articles.

doi

The doi code of the article that the data was taken from. If the article doesn't have a DOI, the value will be represented as NA.

Author

The name/s of the author/s of the article the data was sourced from.

Year

The year of data publication.

Paper URL

The website address provides access to the article.

Data URL

The website where the data is hosted.

References.csv example

doi	author	year	paper_url
10.2307/1948658	ralph w. dexter	1947-01-01	https://doi.org/10.2307%2F1948658

Standardizing data for the EMLN package

Nodes.csv

node_id

node_name (*)

type

Taxa verification

Taxa verification level

Nodes.csv example

State_nodes.csv

node_id

node_name

layer_id

State_nodes.csv example

Layers.csv

layer

type

longitude and latitude

location

name

directed

Layers.csv example

Interactions.csv

interaction_id

node_from

layer_from

node_to

layer_to

type

weight

Interactions.csv example

Description.csv

Data Entry

Description

Source

Data URL

Ecological Network Type:

Multilayer Network Types:

State Nodes

Description.csv example

References.csv

doi

Author

Year

Paper URL

Data URL

References.csv example

Uh oh!

Clone this wiki locally