postgresql
create_db_if_not_exists(uri_wo_db, db_name, comment=None)
¶
Create a database if it doesn't exist.
Source code in src/bio_data_to_db/utils/postgresql.py
create_schema_if_not_exists(uri, schema_name, comment=None)
¶
Create a schema if it doesn't exist. The DB should already exist.
Source code in src/bio_data_to_db/utils/postgresql.py
make_int_column_primary_key_identity(uri, *, schema_name='public', table_name, column_name='index')
¶
Make an existing index column (integer type) as primary key with auto increment (identity).
This is used because pl.DataFrame.write_database() doesn't support writing index column as primary key. Also, it will automatically set the start value of auto increment to the max value in the column.
Example
df = pl.DataFrame({"smiles": ["CCO", "CCN", "CCC"]}) # doctest: +SKIP ... df = df.with_row_index("pk_id") ... df.write_database(...) ... set_column_as_primary_key(uri=uri, table_name="table", column_name="pk_id") ... df2 = pl.DataFrame({"smiles": ["CCC", "CCN", "CCO"]}) # append without pk_id ... df2.write_database(..., if_table_exists="append") # it will auto increment pk_id
Source code in src/bio_data_to_db/utils/postgresql.py
make_columns_primary_key(uri, *, schema_name='public', table_name, column_names)
¶
Make multiple columns as primary key but without auto increment (identity).
This is similar to make_columns_unique() but with primary key constraint.
Source code in src/bio_data_to_db/utils/postgresql.py
make_columns_unique(uri, *, schema_name='public', table_name, column_names)
¶
Set unique constraint on a column or columns in a table.
If multiple columns are provided, the unique constraint will be on the combination of the columns.
Source code in src/bio_data_to_db/utils/postgresql.py
make_large_columns_unique(uri, *, schema_name='public', table_name, column_names)
¶
Use this when the values are large texts, e.g. fasta sequences.
Reference
- https://stackoverflow.com/questions/71379137/how-to-solve-postgresql-index-width-problem
Source code in src/bio_data_to_db/utils/postgresql.py
split_column_str_to_list(uri, *, schema_name='public', table_name, in_column, out_column, separator, pg_element_type='text')
¶
Split a string column into a list column.
Source code in src/bio_data_to_db/utils/postgresql.py
polars_write_database(df, *, schema_name='public', table_name, connection, if_table_exists='fail')
¶
pl.DataFrame.write_database() but address the issue of writing unsigned and list columns to database.
Reference
- https://stackoverflow.com/questions/77098480/polars-psycopg2-write-column-of-lists-to-postgresql