Index
uniprot_xml_to_postgresql(*, uniprot_xml_path, uri)
¶
(🦀 Rust) Load UniProt XML file into PostgreSQL database.
This creates a uniprot
database and a uniprot_info
table.
create_accession_to_pk_id(uri)
¶
Create a table to map accession to uniprot_pk_id, from the uniprot_info table.
It creates the following tables:
- accession_to_pk_id
- accession_to_pk_id_list
Note
The mapping is not unique. It is possible to have multiple uniprot_pk_id for a single accession and vice versa.
Source code in src/bio_data_to_db/uniprot/utils.py
create_empty_table(uri)
¶
Create an empty table in the database. Necessary to create the table structure before inserting data.
Note
It runs the following SQL query:
CREATE TABLE public.uniprot_info (
uniprot_pk_id BIGINT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
accessions TEXT[],
names TEXT[],
protein_names TEXT[],
gene_names TEXT[],
organism_scientific TEXT,
organism_commons TEXT[],
organism_synonyms TEXT[],
ncbi_taxonomy_id INT,
deargen_ncbi_taxonomy_id INT,
lineage TEXT[],
keywords TEXT[],
geneontology_ids TEXT[],
geneontology_names TEXT[],
sequence TEXT,
deargen_molecular_functions TEXT[]
)
Source code in src/bio_data_to_db/uniprot/utils.py
keywords_tsv_to_postgresql(keywords_tsv_file, uri, schema_name='public', table_name='keywords')
¶
Load the keywords_all_2024_06_26.tsv (or similar version) file into the database.