🧱 Database Schema
Below is the full schema diagram used in the GBC publication analysis project.

Core Tables
resource
- Describes a biodata resource
- Linked to URLs and associated connection statuses/online status
- Linked to publications by inventory, mentions and accessions/data citation
- inventory links represented in the
resource_publicationtable - mentions links represented in the
resource_mentiontable - accession/data citation links represented in the
accessionandaccession_publicationtables
- inventory links represented in the
- Versioned by
version_id, which is fully described in theversiontable- each workflow run/data source represented by a different version, so provenece is captured by this too
is_latestrepresents the most recent version of the resourceis_gcbrcaptures GCBR status- Represented in API by
Resourceobject
publication
- Stores metadata for publications (title, journal, year, etc.)
- Affiliation country inferred upon import
- Linked to resources by inventory, mentions and accessions/data citation
- inventory links represented in the
resource_publicationtable - mentions links represented in the
resource_mentiontable - accession/data citation links represented in the
accessionandaccession_publicationtables
- inventory links represented in the
- Linked to associated grants and grant agencies
- Represented in API by
Publicationobject
resource_publication
- Joins resource and publication tables, allowing a many-to-many type of relationship
- This link represents the inventory. i.e. a link here means that the publication describes the resource.
- Represented in API as the
.publicationsattribute of aResourceobject
url
- Describes URL of a resource
- Represented in API by
URLobject
connection_status
- Describes ping/connection information for a URL
is_onlineinferred from ping return status- Represented in API by
ConnectionStatusobject
grant
- Describes grants associated with resources/publications and their agencies
- Represented in API by
Grantobject
grant_agency
- Records for grant_agencies (name, estimated country)
parent_agency_idandrepresentative_agency_idrepresents relationships between agenciesparent_agency_idintroduces hierarchical relationships (i.e. agency X funds agency Y)representative_agency_idgroups different names/aliases for the same agency (i.e. XYZ and X.Y.Z. are the same agency, and XYZ is the representative/canonical name for the whole group)
- Represented in API by
GrantAgencyobject
resource_grant
- Joins resource and grant tables
- Allows a many-to-many relationship between the two
- Represented in API as the
.grantsattribute of aResourceobject
publication_grant
- Joins publication and grant tables
- Allows a many-to-many relationship between the two
- Represented in API as the
.grantsattribute of aPublicationobject
version
- Describes the pipeline/process that identified the data
- Means of versioning & data provenece
- Represented in API by
Versionobject
accession
- Holds data citations/accessions plus their associated metadata
- Links to resources via
resource_id - Versioned by
version_id - Represented in API by
Accessionobject
accession_publication
- Maps accessions to the publications they were found in
- Table to join accession and publication tables, allowing a many-to-many type of relationship
- Represented in API as the
.publicationsattribute of anAccessionobject
resource_mention
- Links resources to publications that have mentioned the resource's name in the article text
- Includes context, classifier confidence, etc.
- Represented in API by
ResourceMentionobject
Note: additional tables
2 additional tables are present in the schema definition file, which are largely independent of this analysis work:
open_letter: track the signatures of GBC's open letterwildsi: imported data from the WilDSI project, about sequence data usage
These tables do not have API classes.