API Reference

FACT_unified_data_IO.check_if_table_name_in_manifestMethod
check_if_table_name_in_manifest(conn::MySQL.Connection, table_name::String) -> DataFrame

Check if a given table name exists in the 'manifest' table of a MySQL database.

This function queries the 'manifest' table to determine if the specified table name exists within it. The query checks for the presence of table_name in the result_table_name column of the 'manifest' table. The function returns the query result as a DataFrame, which will be non-empty if the table name exists.

Arguments

  • conn::MySQL.Connection: An active connection to a MySQL database.
  • table_name::String: The table name to search for in the 'manifest' table.

Returns

  • DataFrame: A DataFrame containing the query result. It will be non-empty if the table name exists in the 'manifest' table.

Example

conn = MySQL.connect("localhost", "username", "password", db = "dbname") tablename = "mytable" resultdf = checkiftablenameinmanifest(conn, table_name)

In this example, check_if_table_name_in_manifest is used to check if 'my_table' exists in the 'manifest' table of the database. The result of the query is returned as a DataFrame.

source
FACT_unified_data_IO.close_connectionMethod

close_connection(conn::MySQL.Connection)

Close the connection to a MySQL database.

Arguments

  • conn::MySQL.Connection: The MySQL database connection to be closed.

This function closes the MySQL database connection specified by conn using the DBInterface.close! function.

source
FACT_unified_data_IO.create_manifest_tableMethod
create_manifest_table(conn::MySQL.Connection)

Create a 'manifest' table in a MySQL database if it does not already exist.

This function creates a table named 'manifest' with columns for id, methodname, argumentsstring, argumentsdict, and resulttable_name. The id column is an auto-incrementing primary key.

Arguments

  • conn::MySQL.Connection: An active connection to a MySQL database.

Example

conn = MySQL.connect("localhost", "username", "password", db = "dbname") create_table(conn)

In this example, create_manifest_table is called with a MySQL connection conn. This will create the 'manifest' table in the connected database if it doesn't already exist.

source
FACT_unified_data_IO.create_results_databaseMethod

createresultsdatabase(service::OrchestratorRegistry.Service)

Ensures the FACT_results database exists for the specified service.

This function connects to the database server using the provided Service instance, checks for the existence of the FACT_results database, and creates it if it does not already exist.

Arguments:

  • service::OrchestratorRegistry.Service: The Service instance containing the host and port details required to connect to the database server.

Behavior:

  • Connects to the database server using the service's host and port.
  • Executes a CREATE DATABASE IF NOT EXISTS query to ensure the FACT_results database exists.
  • Closes the connection to the database server after executing the query.

Raises:

  • Error: If the connection to the database server fails or the query execution encounters an error.

Example:

# Define a service instance
service = Service("123", "auth-service", "http://localhost", 3306)

# Ensure the database exists
create_database(service)
source
FACT_unified_data_IO.create_table_from_dataframeMethod
create_table_from_dataframe(conn::MySQL.Connection, df::DataFrame, table_name::String)

Creates a new table in a MySQL or MariaDB database that matches the schema of a provided Julia DataFrame. This function dynamically constructs a CREATE TABLE SQL statement, mapping Julia types to SQL types according to the SQLTYPES dictionary, with special handling for columns intended to store geometrical data as LONGTEXT.

Arguments

  • conn::MySQL.Connection: An active connection to a MySQL or MariaDB database.
  • df::DataFrame: The DataFrame whose schema is used to define the structure of the new table.
  • table_name::String: The name of the new table to be created. If a table with this name already exists, the operation will fail unless the existing table is dropped beforehand.

Special Handling

Columns named shape_obj or shape (case-insensitive) are treated specially and are mapped to the LONGTEXT SQL type, regardless of their original Julia type, to accommodate potentially large geometrical data.

Usage Example

conn = MySQL.connect("hostname", "username", "password", "databasename") df = DataFrame(region=["North", "South"], geoid=["N001", "S001"], shapeobj=["POLYGON((...))", "POLYGON((...))"], area=[123.4, 567.8]) createtablefromdataframe(conn, df, "geographical_features")

This creates a table named 'geographical_features' in the specified database, with columns mapped from the DataFrame's schema and geometrical data columns set as LONGTEXT.

Error Handling

If the DataFrame contains a column with an unsupported Julia type (not found in SQLTYPES), an ArgumentError is thrown.

source
FACT_unified_data_IO.custom_loadMethod
custom_load(conn::MySQL.Connection, table_name::String, df::DataFrame)

Inserts the contents of a DataFrame into an existing SQL table in a MySQL or MariaDB database. This function constructs and executes a parameterized INSERT INTO statement for each row in the DataFrame, ensuring that data is inserted in a manner consistent with the table's schema.

Arguments

  • conn::MySQL.Connection: An active connection to a MySQL or MariaDB database.
  • table_name::String: The name of the table into which data from the DataFrame will be inserted.
  • df::DataFrame: The DataFrame containing the data to be inserted into the table.

Details

This function dynamically constructs the column names and placeholder parameters based on the DataFrame's schema. It then prepares a single INSERT INTO SQL statement and executes it for each row in the DataFrame, passing the row's values as parameters to ensure type safety and protection against SQL injection.

Column names are wrapped in backticks to allow for the use of reserved words or special characters in column names.

Example

conn = MySQL.connect("hostname", "username", "password", "databasename") df = DataFrame(region=["North", "South"], geoid=["N001", "S001"], area=[123.4, 567.8]) customload(conn, "geographicalfeatures", df)

This will insert each row of df into the geographical_features table in the specified database.

Notes

  • The table specified by table_name must already exist in the database, and its schema should be compatible with the structure of the DataFrame df.
  • This function does not handle the creation of the table or manage conflicts with existing data. If the table does not exist or if there are key conflicts, the function will raise an error.
  • To prevent SQL injection and ensure data integrity, this function uses parameterized SQL statements.
source
FACT_unified_data_IO.does_table_exist_for_methodMethod
does_table_exist_for_method(conn::MySQL.Connection, method_name::String, arguments_string::String, arguments::Dict) -> Bool

Check if a specific table exists in the database based on a method name, an argument string, and additional arguments. This function queries a manifest (or similar mapping) that links method names and their arguments to database table names.

Arguments

  • conn::MySQL.Connection: The MySQL database connection.
  • method_name::String: The name of the method to check. This is a key or identifier used to search the manifest.
  • arguments_string::String: A string representation of arguments or conditions. Its format and usage can vary depending on the manifest's structure.
  • arguments::Dict: A dictionary of arguments providing additional search criteria or parameters for the manifest query.

Returns

Bool: Returns true if the table exists for the specified method and arguments, based on the manifest's mapping. Returns false if no such table is found.

Example

To check if a table exists for a method 'getEmployeeData' with arguments indicating a permanent type in the engineering department, you would call the function as follows:

  • Initialize your MySQL connection as conn.
  • Set method_name to "getEmployeeData".
  • Define arguments_string as "type=permanent".
  • Create a dictionary arguments with {"department" => "engineering"}.
  • Call does_table_exist_for_method(conn, method_name, arguments_string, arguments) to check for the table's existence.

Notes

This function relies on the implementation of find_table_name_in_manifest, which must query a manifest or similar structure to find the corresponding table name based on the provided method name and arguments. The manifest structure and query logic are assumed to be predefined.

source
FACT_unified_data_IO.establish_connectionMethod
establish_connection(host, user, password, dbname; port=3306) -> MySQL.Connection

Establish a connection to a MySQL/MariaDB database.

Arguments

  • host::String: Database host
  • user::String: Username
  • password::String: Password
  • dbname::String: Database name
  • port::Int=3306: Port number (default is 3306)

Returns

  • MySQL.Connection: A connection object to the database.
source
FACT_unified_data_IO.find_table_name_in_manifestMethod
find_table_name_in_manifest(conn::MySQL.Connection, method_name::String, arguments_string::String, arguments::Dict) -> String

Search for a table name in the 'manifest' table of a MySQL database that matches the given method name, arguments string, and arguments dictionary.

This function queries the 'manifest' table for an entry that matches the specified method name, arguments string, and a JSON-serialized version of the arguments dictionary. If a matching entry is found, the function returns the associated table name. If no match is found, an empty string is returned.

Arguments

  • conn::MySQL.Connection: An active connection to a MySQL database.
  • method_name::String: The name of the method to match in the 'manifest' table.
  • arguments_string::String: The string representation of the arguments to match in the 'manifest' table.
  • arguments::Dict: The dictionary of arguments to match in the 'manifest' table.

Returns

  • String: The name of the matching table in the 'manifest' table, or an empty string if no match is found.

Example

conn = MySQL.connect("localhost", "username", "password", db = "dbname") methodname = "fetchdata" argumentsstring = "param1=value1,param2=value2" arguments = Dict("param1" => "value1", "param2" => "value2") tablename = findtablenameinmanifest(conn, methodname, argumentsstring, arguments)

In this example, find_table_name_in_manifest is used to search for a table in the 'manifest' table that matches the given method name and arguments. It returns the name of the found table or an empty string if no match is found.

source
FACT_unified_data_IO.generate_random_table_nameMethod
generate_random_table_name(length::Int) -> String

Generate a random table name of the specified length, adhering to MySQL naming conventions.

The function creates a random table name consisting of letters (both uppercase and lowercase), digits, and underscores. The first character of the table name is ensured to be a letter or an underscore, as per MySQL table naming rules.

Arguments

  • length::Int: The desired length of the generated table name.

Returns

  • String: A randomly generated table name that complies with MySQL naming rules.

Example

tablename = generaterandomtablename(10)

This example generates a random table name with 10 characters. The resulting table name is a valid identifier for a MySQL table.

source
FACT_unified_data_IO.get_results_connectionMethod

getresultsconnection(service_dict::Dict) -> Connection

Establishes a database connection using the provided service configuration.

This function creates a Service object from the given service_dict and uses it to establish a database connection. The connection is configured based on the host and port specified in the dictionary.

Arguments

  • service_dict::Dict: A dictionary containing the following keys:
    • "host": The hostname or IP address of the database service.
    • "port": The port number for the database service.

Returns

  • Connection: A connection object to the database service.

Behavior

  • Constructs a Service object using the provided host and port from service_dict.
  • Calls get_results_connection with the constructed Service object to establish the database connection.

Example

Define the service dictionary and establish a connection:

service_dict = Dict(
    "host" => "127.0.0.1",
    "port" => 3306
)

connection = get_results_connection(service_dict)
println(connection)

In this example, a database connection is established using the host 127.0.0.1 and port 3306.

source
FACT_unified_data_IO.get_results_connectionMethod

getresultsconnection(service::OrchestratorRegistry.Service)

Establishes and returns a database connection for the specified service.

This function ensures that the database associated with the given Service instance is ready for use. It performs the following steps:

  1. Creates the database if it does not already exist.
  2. Establishes a connection to the database using the service's host and port.
  3. Ensures the manifest table is present in the database, creating it if necessary.

Arguments:

  • service::OrchestratorRegistry.Service: The Service instance containing the details (host and port) required to connect to the database.

Returns:

  • conn: The database connection object established for the specified service.

Raises:

  • Error: If the database connection cannot be established or if any database operation fails.

Behavior:

  • Uses the host and port of the service to connect.
  • Assumes the database credentials are root and devops.
  • The database name used is FACT_results.

Example:

# Define a service instance
service = Service("123", "auth-service", "http://localhost", 3306)

# Establish a database connection
conn = get_connection(service)

# Perform operations using the connection
execute_query(conn, "SELECT * FROM manifest")
source
FACT_unified_data_IO.get_tableMethod
get_table(conn::MySQL.Connection, method_name::String, arguments_string::String, arguments::Dict) -> DataFrame

Retrieve data from a specified table in a MySQL database based on the given method name, arguments string, and arguments dictionary.

This function first attempts to find the table name in a 'manifest' table using the provided method name, arguments string, and arguments dictionary. If the table name is found, it then retrieves the data from this table and returns it as a DataFrame. If no corresponding table name is found, an empty DataFrame is returned.

Arguments

  • conn::MySQL.Connection: An active connection to a MySQL database.
  • method_name::String: The name of the method used to identify the table in the 'manifest'.
  • arguments_string::String: The arguments string used to identify the table in the 'manifest'.
  • arguments::Dict: A dictionary of arguments used to identify the table in the 'manifest'.

Returns

  • DataFrame: A DataFrame containing the data from the identified table, or an empty DataFrame if no table is found.

Example

conn = MySQL.connect("localhost", "username", "password", db = "dbname") methodname = "mymethod" argumentsstring = "arg1=val1,arg2=val2" arguments = Dict("arg1" => "val1", "arg2" => "val2") df = gettable(conn, methodname, argumentsstring, arguments)

In this example, get_table is called with a MySQL connection conn and specific method details. It retrieves data from the identified table in the database and returns it as a DataFrame.

source
FACT_unified_data_IO.overwrite_tableMethod
overwrite_table(conn::MySQL.Connection, method_name::String, arguments_string::String, arguments::Dict, results::DataFrame)

Overwrite a MySQL table with new data from a DataFrame. If the table specified by the method name and arguments exists, it is dropped and recreated with the new data. If the table does not exist, it is created.

This function first checks if a table name corresponding to the given method name and arguments exists in the manifest. If the table exists, it is dropped (along with its associated record in 'manifest'), and then a new table with the same name is created. The data from the provided DataFrame is then written into this new table. If the table does not exist, it is simply created and the data is written to it.

Arguments

  • conn::MySQL.Connection: An active connection to a MySQL database.
  • method_name::String: The name of the method, used as part of the table identification.
  • arguments_string::String: A string representation of the arguments, used as part of the table identification.
  • arguments::Dict: A dictionary of arguments, used as part of the table identification.
  • results::DataFrame: The DataFrame containing data to be written to the database.

Example

conn = MySQL.connect("localhost", "username", "password", db = "FACTresults") methodname = "updatestatistics" argumentsstring = "param1=val1,param2=val2" arguments = Dict("param1" => "val1", "param2" => "val2") results = DataFrame(col1 = [1, 2, 3], col2 = ["A", "B", "C"]) overwritetable(conn, methodname, arguments_string, arguments, results)

In this example, overwrite_table is called with a MySQL connection, method name, arguments, and a DataFrame. If a table matching the method name and arguments exists, it is overwritten with the new data from the DataFrame.

source
FACT_unified_data_IO.preprocess_dataframe!Method
preprocess_dataframe!(df::DataFrame)

Preprocesses a DataFrame df in place to ensure its values conform to specific rules before loading into a SQL database. This function performs the following operations on each column of the DataFrame:

  • Clamps numeric values to specified minimum and maximum values based on the column's data type, using a predefined SQL_RANGES dictionary that maps Julia types to their corresponding SQL range limits.

  • Replaces NaN and Inf values in floating-point columns with nothing, making them compatible with SQL NULL values. This step ensures that the DataFrame can be safely inserted into a SQL database without encountering errors related to unsupported or undefined numeric values.

Arguments

  • df::DataFrame: The DataFrame to be preprocessed. The function modifies df in place.

Examples

Given a DataFrame df with possible out-of-range and NaN values:

df = DataFrame(

A = [1.0, 2.0, NaN, 4.0],
B = [Inf, -Inf, 5.0, 6.0]

)

And a predefined SQL_RANGES dictionary specifying the allowed range for each type:

const SQL_RANGES = Dict{Type, Tuple{Number, Number}}(

Float64 => (-100.0, 100.0)  # Example range for demonstration

)

Call preprocess_dataframe! to preprocess df in place:

preprocess_dataframe!(df)

After preprocessing, df will have its NaN and Inf values replaced with nothing, and numeric values will be clamped to the specified ranges in SQL_RANGES.

Notes

  • The SQL_RANGES dictionary must be defined and accessible in the scope where preprocess_dataframe! is called. It should map Julia types to tuples of (min_value, max_value), specifying the allowed range for each type.

  • The function operates in place, modifying the original DataFrame. Ensure you have made any necessary backups of the original data before preprocessing.

source
FACT_unified_data_IO.query_connectionMethod
query_connection(conn::MySQL.Connection, query::String) -> DataFrame

Execute a SQL query on a given MySQL database connection and return the results in a DataFrame.

This function takes a MySQL connection and a SQL query string, executes the query, and returns the results in a DataFrame. Each row of the query result is converted into a row in the DataFrame. All column types in the resulting DataFrame are set to String.

Arguments

  • conn::MySQL.Connection: An active connection to a MySQL database.
  • query::String: The SQL query string to be executed.

Returns

  • DataFrame: A DataFrame containing the results of the query.

Example

conn = MySQL.connect("localhost", "username", "password", db = "dbname") query = "SELECT * FROM mytable" df = queryconnection(conn, query)

In this example, query_connection is used to execute a SQL query on a MySQL database and the results are stored in a DataFrame df.

source
FACT_unified_data_IO.write_tableMethod
write_table(conn::MySQL.Connection, method_name::String, arguments_string::String, arguments::Dict, results::DataFrame)

Write data from a DataFrame to a MySQL table based on the specified method name and arguments. This function first attempts to find a corresponding table name in the manifest. If the table does not exist, a new table name is generated, and the DataFrame is written to the new table in the MySQL database. If a table with the specified method name and arguments already exists, the function raises an error to prevent overwriting existing data.

Arguments

  • conn::MySQL.Connection: An active connection to a MySQL database.
  • method_name::String: The name of the method associated with the data. Used to identify or generate the table name.
  • arguments_string::String: A string representation of the arguments associated with the data. Used in conjunction with method_name to identify or generate the table name.
  • arguments::Dict: A dictionary of arguments associated with the data. This is used for more detailed tracking or identification but does not directly affect the table name.
  • results::DataFrame: The DataFrame containing the data to be written to the MySQL database.

Returns

  • This function does not return a value. It either successfully writes the data to the database or raises an error if the table already exists.

Errors

  • Raises an error if a table associated with the provided method_name and arguments_string already exists in the manifest, indicating that the data might be overwritten.

Example

conn = MySQL.connect("host", "user", "password", "dbname") methodname = "mydatamethod" argumentsstring = "param1=value1,param2=value2" arguments = Dict("param1" => "value1", "param2" => "value2") results = DataFrame(A = 1:3, B = ["x", "y", "z"])

In this example, the write_table function attempts to write the contents of results DataFrame to a MySQL table. If a table corresponding to method_name and arguments_string does not exist, it will be created and populated with the data from results. If such a table already exists, an error will be raised.

source
FACT_unified_data_IO.write_table_name_in_manifestMethod
write_table_name_in_manifest(conn::MySQL.Connection, method_name::String, arguments_string::String, arguments::Dict) -> String

Insert a new entry into the 'manifest' table with a unique, randomly generated table name, along with the provided method name, arguments string, and arguments dictionary. The function returns the generated table name.

This function first serializes the provided arguments dictionary into a JSON string. It then generates a random table name and checks if this name already exists in the 'manifest' table. If the name exists, it generates a new name. This process repeats until a unique name is found. Once a unique table name is generated, the function inserts an entry into the 'manifest' table with the method name, arguments string, arguments dictionary, and the generated table name.

Arguments

  • conn::MySQL.Connection: An active connection to a MySQL database.
  • method_name::String: The name of the method to be recorded in the 'manifest' table.
  • arguments_string::String: The string representation of the arguments to be recorded in the 'manifest' table.
  • arguments::Dict: The dictionary of arguments to be serialized and recorded in the 'manifest' table.

Returns

  • String: The randomly generated unique table name.

Example

conn = MySQL.connect("localhost", "username", "password", db = "dbname") methodname = "fetchdata" argumentsstring = "param1=value1,param2=value2" arguments = Dict("param1" => "value1", "param2" => "value2") tablename = writetablenameinmanifest(conn, methodname, argumentsstring, arguments)

In this example, write_table_name_in_manifest is called with a MySQL connection, method name, arguments string, and a dictionary of arguments. It inserts a new entry into the 'manifest' table and returns the generated table name.

source
FACT_unified_data_IO.FACT_unified_data_IOModule
Module FACT_unified_data_IO

A Julia module for interfacing with a MySQL database, specifically designed for managing and interacting with FACT consolidated data.

This module provides functionalities to establish connections to a MySQL database, retrieve and write data to tables, and manage a manifest table for tracking database table associations with specific method calls and arguments.

Features

  • Database Connection Management: Functions to establish and close connections to a MySQL database.
  • Data Retrieval and Writing: Functions to read from and write to database tables.
  • Manifest Management: Functions to interact with a manifest table that maps method calls and arguments to specific database tables.

Usage

The module exports various functions from included scripts. Key functionalities include getting connections to a MySQL database, closing connections, managing data tables, and interacting with a manifest table.

Exported Functions

  • get_connection: Establish a connection to a MySQL database.
  • close_connection: Close an established connection to a MySQL database.
  • get_table: Retrieve data from a specific table based on method name and arguments.
  • write_table: Write data to a table and record its details in the manifest.
  • overwrite_table: Overwrite data in a table and update the manifest.

Dependencies

  • MySQL.jl: For MySQL database connectivity and operations.
  • JSON.jl: For JSON parsing and serialization.
  • DataFrames.jl: For handling data in tabular form.
source
FACT_unified_data_IO.SQL_RANGESConstant
SQL_RANGES

A constant dictionary mapping Julia data types to tuples representing the minimum and maximum values allowed for each type, intended to align with the range limits of corresponding SQL data types.

This mapping ensures data being prepared for SQL operations does not exceed the storage capabilities of the target database's data types, helping prevent runtime errors due to type overflows or underflows.

Contents

The dictionary contains mappings for the following Julia types:

  • Int8, Int16, Int32, Int64: Signed integer types with corresponding SQL range limits.
  • UInt8, UInt16, UInt32, UInt64: Unsigned integer types with corresponding SQL range limits.
  • Float32, Float64: Floating-point types with approximate range limits to accommodate the range of floating-point numbers in SQL.

Usage Example

To clamp a value to the allowed range for its type before inserting it into a database:

value = clamp(value, SQLRANGES[typeof(value)][1], SQLRANGES[typeof(value)][2])

Notes

  • The ranges for Float32 and Float64 are approximations and may vary depending on specific database implementations or configurations.
  • This dictionary should be used as a guide for data preprocessing steps, ensuring compatibility with SQL database types and preventing insertion errors.
source
FACT_unified_data_IO.SQLTYPESConstant
SQLTYPES::Dict{Type, String}

A dictionary mapping from Julia data types to SQL data types, used for generating SQL table schemas that correspond to the structure of a Julia DataFrame. This mapping ensures that data stored in a DataFrame can be appropriately represented in a SQL database.

Supported Mappings

  • Int8 to TINYINT
  • Int16 to SMALLINT
  • Int32 to INTEGER
  • Int64 to BIGINT
  • UInt8 to TINYINT UNSIGNED
  • UInt16 to SMALLINT UNSIGNED
  • UInt32 to INTEGER UNSIGNED
  • UInt64 to BIGINT UNSIGNED
  • Float32 to FLOAT
  • Float64 to DOUBLE
  • Bool to BOOL
  • Vector{UInt8} to BLOB (for storing binary data)
  • String to VARCHAR(255) (default maximum length for strings)
  • Date to DATE
  • Time to TIME
  • DateTime to DATETIME

The dictionary is used by the create_table_from_dataframe function to automatically determine the SQL column types based on the DataFrame's column types.

source