API Reference
FACT_unified_data_IO.check_if_table_name_in_manifest
— Methodcheck_if_table_name_in_manifest(conn::MySQL.Connection, table_name::String) -> DataFrame
Check if a given table name exists in the 'manifest' table of a MySQL database.
This function queries the 'manifest' table to determine if the specified table name exists within it. The query checks for the presence of table_name
in the result_table_name
column of the 'manifest' table. The function returns the query result as a DataFrame, which will be non-empty if the table name exists.
Arguments
conn::MySQL.Connection
: An active connection to a MySQL database.table_name::String
: The table name to search for in the 'manifest' table.
Returns
DataFrame
: A DataFrame containing the query result. It will be non-empty if the table name exists in the 'manifest' table.
Example
conn = MySQL.connect("localhost", "username", "password", db = "dbname") tablename = "mytable" resultdf = checkiftablenameinmanifest(conn, table_name)
In this example, check_if_table_name_in_manifest
is used to check if 'my_table' exists in the 'manifest' table of the database. The result of the query is returned as a DataFrame.
FACT_unified_data_IO.close_connection
— Methodclose_connection(conn::MySQL.Connection)
Close the connection to a MySQL database.
Arguments
conn::MySQL.Connection
: The MySQL database connection to be closed.
This function closes the MySQL database connection specified by conn
using the DBInterface.close!
function.
FACT_unified_data_IO.create_manifest_table
— Methodcreate_manifest_table(conn::MySQL.Connection)
Create a 'manifest' table in a MySQL database if it does not already exist.
This function creates a table named 'manifest' with columns for id, methodname, argumentsstring, argumentsdict, and resulttable_name. The id column is an auto-incrementing primary key.
Arguments
conn::MySQL.Connection
: An active connection to a MySQL database.
Example
conn = MySQL.connect("localhost", "username", "password", db = "dbname") create_table(conn)
In this example, create_manifest_table
is called with a MySQL connection conn
. This will create the 'manifest' table in the connected database if it doesn't already exist.
FACT_unified_data_IO.create_results_database
— Methodcreateresultsdatabase(service::OrchestratorRegistry.Service)
Ensures the FACT_results
database exists for the specified service.
This function connects to the database server using the provided Service
instance, checks for the existence of the FACT_results
database, and creates it if it does not already exist.
Arguments:
service::OrchestratorRegistry.Service
: TheService
instance containing the host and port details required to connect to the database server.
Behavior:
- Connects to the database server using the service's host and port.
- Executes a
CREATE DATABASE IF NOT EXISTS
query to ensure theFACT_results
database exists. - Closes the connection to the database server after executing the query.
Raises:
Error
: If the connection to the database server fails or the query execution encounters an error.
Example:
# Define a service instance
service = Service("123", "auth-service", "http://localhost", 3306)
# Ensure the database exists
create_database(service)
FACT_unified_data_IO.create_table_from_dataframe
— Methodcreate_table_from_dataframe(conn::MySQL.Connection, df::DataFrame, table_name::String)
Creates a new table in a MySQL or MariaDB database that matches the schema of a provided Julia DataFrame. This function dynamically constructs a CREATE TABLE
SQL statement, mapping Julia types to SQL types according to the SQLTYPES
dictionary, with special handling for columns intended to store geometrical data as LONGTEXT
.
Arguments
conn::MySQL.Connection
: An active connection to a MySQL or MariaDB database.df::DataFrame
: The DataFrame whose schema is used to define the structure of the new table.table_name::String
: The name of the new table to be created. If a table with this name already exists, the operation will fail unless the existing table is dropped beforehand.
Special Handling
Columns named shape_obj
or shape
(case-insensitive) are treated specially and are mapped to the LONGTEXT
SQL type, regardless of their original Julia type, to accommodate potentially large geometrical data.
Usage Example
conn = MySQL.connect("hostname", "username", "password", "databasename") df = DataFrame(region=["North", "South"], geoid=["N001", "S001"], shapeobj=["POLYGON((...))", "POLYGON((...))"], area=[123.4, 567.8]) createtablefromdataframe(conn, df, "geographical_features")
This creates a table named 'geographical_features' in the specified database, with columns mapped from the DataFrame's schema and geometrical data columns set as LONGTEXT
.
Error Handling
If the DataFrame contains a column with an unsupported Julia type (not found in SQLTYPES
), an ArgumentError
is thrown.
FACT_unified_data_IO.custom_load
— Methodcustom_load(conn::MySQL.Connection, table_name::String, df::DataFrame)
Inserts the contents of a DataFrame into an existing SQL table in a MySQL or MariaDB database. This function constructs and executes a parameterized INSERT INTO
statement for each row in the DataFrame, ensuring that data is inserted in a manner consistent with the table's schema.
Arguments
conn::MySQL.Connection
: An active connection to a MySQL or MariaDB database.table_name::String
: The name of the table into which data from the DataFrame will be inserted.df::DataFrame
: The DataFrame containing the data to be inserted into the table.
Details
This function dynamically constructs the column names and placeholder parameters based on the DataFrame's schema. It then prepares a single INSERT INTO
SQL statement and executes it for each row in the DataFrame, passing the row's values as parameters to ensure type safety and protection against SQL injection.
Column names are wrapped in backticks to allow for the use of reserved words or special characters in column names.
Example
conn = MySQL.connect("hostname", "username", "password", "databasename") df = DataFrame(region=["North", "South"], geoid=["N001", "S001"], area=[123.4, 567.8]) customload(conn, "geographicalfeatures", df)
This will insert each row of df
into the geographical_features
table in the specified database.
Notes
- The table specified by
table_name
must already exist in the database, and its schema should be compatible with the structure of the DataFramedf
. - This function does not handle the creation of the table or manage conflicts with existing data. If the table does not exist or if there are key conflicts, the function will raise an error.
- To prevent SQL injection and ensure data integrity, this function uses parameterized SQL statements.
FACT_unified_data_IO.does_table_exist_for_method
— Methoddoes_table_exist_for_method(conn::MySQL.Connection, method_name::String, arguments_string::String, arguments::Dict) -> Bool
Check if a specific table exists in the database based on a method name, an argument string, and additional arguments. This function queries a manifest (or similar mapping) that links method names and their arguments to database table names.
Arguments
conn::MySQL.Connection
: The MySQL database connection.method_name::String
: The name of the method to check. This is a key or identifier used to search the manifest.arguments_string::String
: A string representation of arguments or conditions. Its format and usage can vary depending on the manifest's structure.arguments::Dict
: A dictionary of arguments providing additional search criteria or parameters for the manifest query.
Returns
Bool
: Returns true
if the table exists for the specified method and arguments, based on the manifest's mapping. Returns false
if no such table is found.
Example
To check if a table exists for a method 'getEmployeeData' with arguments indicating a permanent type in the engineering department, you would call the function as follows:
- Initialize your MySQL connection as
conn
. - Set
method_name
to "getEmployeeData". - Define
arguments_string
as "type=permanent". - Create a dictionary
arguments
with{"department" => "engineering"}
. - Call
does_table_exist_for_method(conn, method_name, arguments_string, arguments)
to check for the table's existence.
Notes
This function relies on the implementation of find_table_name_in_manifest
, which must query a manifest or similar structure to find the corresponding table name based on the provided method name and arguments. The manifest structure and query logic are assumed to be predefined.
FACT_unified_data_IO.establish_connection
— Methodestablish_connection(host, user, password, dbname; port=3306) -> MySQL.Connection
Establish a connection to a MySQL/MariaDB database.
Arguments
host::String
: Database hostuser::String
: Usernamepassword::String
: Passworddbname::String
: Database nameport::Int=3306
: Port number (default is 3306)
Returns
MySQL.Connection
: A connection object to the database.
FACT_unified_data_IO.find_table_name_in_manifest
— Methodfind_table_name_in_manifest(conn::MySQL.Connection, method_name::String, arguments_string::String, arguments::Dict) -> String
Search for a table name in the 'manifest' table of a MySQL database that matches the given method name, arguments string, and arguments dictionary.
This function queries the 'manifest' table for an entry that matches the specified method name, arguments string, and a JSON-serialized version of the arguments dictionary. If a matching entry is found, the function returns the associated table name. If no match is found, an empty string is returned.
Arguments
conn::MySQL.Connection
: An active connection to a MySQL database.method_name::String
: The name of the method to match in the 'manifest' table.arguments_string::String
: The string representation of the arguments to match in the 'manifest' table.arguments::Dict
: The dictionary of arguments to match in the 'manifest' table.
Returns
String
: The name of the matching table in the 'manifest' table, or an empty string if no match is found.
Example
conn = MySQL.connect("localhost", "username", "password", db = "dbname") methodname = "fetchdata" argumentsstring = "param1=value1,param2=value2" arguments = Dict("param1" => "value1", "param2" => "value2") tablename = findtablenameinmanifest(conn, methodname, argumentsstring, arguments)
In this example, find_table_name_in_manifest
is used to search for a table in the 'manifest' table that matches the given method name and arguments. It returns the name of the found table or an empty string if no match is found.
FACT_unified_data_IO.generate_random_table_name
— Methodgenerate_random_table_name(length::Int) -> String
Generate a random table name of the specified length, adhering to MySQL naming conventions.
The function creates a random table name consisting of letters (both uppercase and lowercase), digits, and underscores. The first character of the table name is ensured to be a letter or an underscore, as per MySQL table naming rules.
Arguments
length::Int
: The desired length of the generated table name.
Returns
String
: A randomly generated table name that complies with MySQL naming rules.
Example
tablename = generaterandomtablename(10)
This example generates a random table name with 10 characters. The resulting table name is a valid identifier for a MySQL table.
FACT_unified_data_IO.get_results_connection
— Methodgetresultsconnection(service_dict::Dict) -> Connection
Establishes a database connection using the provided service configuration.
This function creates a Service
object from the given service_dict
and uses it to establish a database connection. The connection is configured based on the host
and port
specified in the dictionary.
Arguments
service_dict::Dict
: A dictionary containing the following keys:"host"
: The hostname or IP address of the database service."port"
: The port number for the database service.
Returns
Connection
: A connection object to the database service.
Behavior
- Constructs a
Service
object using the provided host and port fromservice_dict
. - Calls
get_results_connection
with the constructedService
object to establish the database connection.
Example
Define the service dictionary and establish a connection:
service_dict = Dict(
"host" => "127.0.0.1",
"port" => 3306
)
connection = get_results_connection(service_dict)
println(connection)
In this example, a database connection is established using the host 127.0.0.1
and port 3306
.
FACT_unified_data_IO.get_results_connection
— Methodgetresultsconnection(service::OrchestratorRegistry.Service)
Establishes and returns a database connection for the specified service.
This function ensures that the database associated with the given Service
instance is ready for use. It performs the following steps:
- Creates the database if it does not already exist.
- Establishes a connection to the database using the service's host and port.
- Ensures the
manifest
table is present in the database, creating it if necessary.
Arguments:
service::OrchestratorRegistry.Service
: TheService
instance containing the details (host and port) required to connect to the database.
Returns:
conn
: The database connection object established for the specified service.
Raises:
Error
: If the database connection cannot be established or if any database operation fails.
Behavior:
- Uses the host and port of the service to connect.
- Assumes the database credentials are
root
anddevops
. - The database name used is
FACT_results
.
Example:
# Define a service instance
service = Service("123", "auth-service", "http://localhost", 3306)
# Establish a database connection
conn = get_connection(service)
# Perform operations using the connection
execute_query(conn, "SELECT * FROM manifest")
FACT_unified_data_IO.get_table
— Methodget_table(conn::MySQL.Connection, method_name::String, arguments_string::String, arguments::Dict) -> DataFrame
Retrieve data from a specified table in a MySQL database based on the given method name, arguments string, and arguments dictionary.
This function first attempts to find the table name in a 'manifest' table using the provided method name, arguments string, and arguments dictionary. If the table name is found, it then retrieves the data from this table and returns it as a DataFrame. If no corresponding table name is found, an empty DataFrame is returned.
Arguments
conn::MySQL.Connection
: An active connection to a MySQL database.method_name::String
: The name of the method used to identify the table in the 'manifest'.arguments_string::String
: The arguments string used to identify the table in the 'manifest'.arguments::Dict
: A dictionary of arguments used to identify the table in the 'manifest'.
Returns
DataFrame
: A DataFrame containing the data from the identified table, or an empty DataFrame if no table is found.
Example
conn = MySQL.connect("localhost", "username", "password", db = "dbname") methodname = "mymethod" argumentsstring = "arg1=val1,arg2=val2" arguments = Dict("arg1" => "val1", "arg2" => "val2") df = gettable(conn, methodname, argumentsstring, arguments)
In this example, get_table
is called with a MySQL connection conn
and specific method details. It retrieves data from the identified table in the database and returns it as a DataFrame.
FACT_unified_data_IO.overwrite_table
— Methodoverwrite_table(conn::MySQL.Connection, method_name::String, arguments_string::String, arguments::Dict, results::DataFrame)
Overwrite a MySQL table with new data from a DataFrame. If the table specified by the method name and arguments exists, it is dropped and recreated with the new data. If the table does not exist, it is created.
This function first checks if a table name corresponding to the given method name and arguments exists in the manifest. If the table exists, it is dropped (along with its associated record in 'manifest'), and then a new table with the same name is created. The data from the provided DataFrame is then written into this new table. If the table does not exist, it is simply created and the data is written to it.
Arguments
conn::MySQL.Connection
: An active connection to a MySQL database.method_name::String
: The name of the method, used as part of the table identification.arguments_string::String
: A string representation of the arguments, used as part of the table identification.arguments::Dict
: A dictionary of arguments, used as part of the table identification.results::DataFrame
: The DataFrame containing data to be written to the database.
Example
conn = MySQL.connect("localhost", "username", "password", db = "FACTresults") methodname = "updatestatistics" argumentsstring = "param1=val1,param2=val2" arguments = Dict("param1" => "val1", "param2" => "val2") results = DataFrame(col1 = [1, 2, 3], col2 = ["A", "B", "C"]) overwritetable(conn, methodname, arguments_string, arguments, results)
In this example, overwrite_table
is called with a MySQL connection, method name, arguments, and a DataFrame. If a table matching the method name and arguments exists, it is overwritten with the new data from the DataFrame.
FACT_unified_data_IO.preprocess_dataframe!
— Methodpreprocess_dataframe!(df::DataFrame)
Preprocesses a DataFrame df
in place to ensure its values conform to specific rules before loading into a SQL database. This function performs the following operations on each column of the DataFrame:
Clamps numeric values to specified minimum and maximum values based on the column's data type, using a predefined
SQL_RANGES
dictionary that maps Julia types to their corresponding SQL range limits.Replaces
NaN
andInf
values in floating-point columns withnothing
, making them compatible with SQLNULL
values. This step ensures that the DataFrame can be safely inserted into a SQL database without encountering errors related to unsupported or undefined numeric values.
Arguments
df::DataFrame
: The DataFrame to be preprocessed. The function modifiesdf
in place.
Examples
Given a DataFrame df
with possible out-of-range and NaN values:
df = DataFrame(
A = [1.0, 2.0, NaN, 4.0],
B = [Inf, -Inf, 5.0, 6.0]
)
And a predefined SQL_RANGES
dictionary specifying the allowed range for each type:
const SQL_RANGES = Dict{Type, Tuple{Number, Number}}(
Float64 => (-100.0, 100.0) # Example range for demonstration
)
Call preprocess_dataframe!
to preprocess df
in place:
preprocess_dataframe!(df)
After preprocessing, df
will have its NaN
and Inf
values replaced with nothing
, and numeric values will be clamped to the specified ranges in SQL_RANGES
.
Notes
The
SQL_RANGES
dictionary must be defined and accessible in the scope wherepreprocess_dataframe!
is called. It should map Julia types to tuples of(min_value, max_value)
, specifying the allowed range for each type.The function operates in place, modifying the original DataFrame. Ensure you have made any necessary backups of the original data before preprocessing.
FACT_unified_data_IO.query_connection
— Methodquery_connection(conn::MySQL.Connection, query::String) -> DataFrame
Execute a SQL query on a given MySQL database connection and return the results in a DataFrame.
This function takes a MySQL connection and a SQL query string, executes the query, and returns the results in a DataFrame. Each row of the query result is converted into a row in the DataFrame. All column types in the resulting DataFrame are set to String
.
Arguments
conn::MySQL.Connection
: An active connection to a MySQL database.query::String
: The SQL query string to be executed.
Returns
DataFrame
: A DataFrame containing the results of the query.
Example
conn = MySQL.connect("localhost", "username", "password", db = "dbname") query = "SELECT * FROM mytable" df = queryconnection(conn, query)
In this example, query_connection
is used to execute a SQL query on a MySQL database and the results are stored in a DataFrame df
.
FACT_unified_data_IO.write_table
— Methodwrite_table(conn::MySQL.Connection, method_name::String, arguments_string::String, arguments::Dict, results::DataFrame)
Write data from a DataFrame to a MySQL table based on the specified method name and arguments. This function first attempts to find a corresponding table name in the manifest. If the table does not exist, a new table name is generated, and the DataFrame is written to the new table in the MySQL database. If a table with the specified method name and arguments already exists, the function raises an error to prevent overwriting existing data.
Arguments
conn::MySQL.Connection
: An active connection to a MySQL database.method_name::String
: The name of the method associated with the data. Used to identify or generate the table name.arguments_string::String
: A string representation of the arguments associated with the data. Used in conjunction withmethod_name
to identify or generate the table name.arguments::Dict
: A dictionary of arguments associated with the data. This is used for more detailed tracking or identification but does not directly affect the table name.results::DataFrame
: The DataFrame containing the data to be written to the MySQL database.
Returns
- This function does not return a value. It either successfully writes the data to the database or raises an error if the table already exists.
Errors
- Raises an error if a table associated with the provided
method_name
andarguments_string
already exists in the manifest, indicating that the data might be overwritten.
Example
conn = MySQL.connect("host", "user", "password", "dbname") methodname = "mydatamethod" argumentsstring = "param1=value1,param2=value2" arguments = Dict("param1" => "value1", "param2" => "value2") results = DataFrame(A = 1:3, B = ["x", "y", "z"])
In this example, the write_table
function attempts to write the contents of results
DataFrame to a MySQL table. If a table corresponding to method_name
and arguments_string
does not exist, it will be created and populated with the data from results
. If such a table already exists, an error will be raised.
FACT_unified_data_IO.write_table_name_in_manifest
— Methodwrite_table_name_in_manifest(conn::MySQL.Connection, method_name::String, arguments_string::String, arguments::Dict) -> String
Insert a new entry into the 'manifest' table with a unique, randomly generated table name, along with the provided method name, arguments string, and arguments dictionary. The function returns the generated table name.
This function first serializes the provided arguments dictionary into a JSON string. It then generates a random table name and checks if this name already exists in the 'manifest' table. If the name exists, it generates a new name. This process repeats until a unique name is found. Once a unique table name is generated, the function inserts an entry into the 'manifest' table with the method name, arguments string, arguments dictionary, and the generated table name.
Arguments
conn::MySQL.Connection
: An active connection to a MySQL database.method_name::String
: The name of the method to be recorded in the 'manifest' table.arguments_string::String
: The string representation of the arguments to be recorded in the 'manifest' table.arguments::Dict
: The dictionary of arguments to be serialized and recorded in the 'manifest' table.
Returns
String
: The randomly generated unique table name.
Example
conn = MySQL.connect("localhost", "username", "password", db = "dbname") methodname = "fetchdata" argumentsstring = "param1=value1,param2=value2" arguments = Dict("param1" => "value1", "param2" => "value2") tablename = writetablenameinmanifest(conn, methodname, argumentsstring, arguments)
In this example, write_table_name_in_manifest
is called with a MySQL connection, method name, arguments string, and a dictionary of arguments. It inserts a new entry into the 'manifest' table and returns the generated table name.
FACT_unified_data_IO.FACT_unified_data_IO
— ModuleModule FACT_unified_data_IO
A Julia module for interfacing with a MySQL database, specifically designed for managing and interacting with FACT consolidated data.
This module provides functionalities to establish connections to a MySQL database, retrieve and write data to tables, and manage a manifest table for tracking database table associations with specific method calls and arguments.
Features
- Database Connection Management: Functions to establish and close connections to a MySQL database.
- Data Retrieval and Writing: Functions to read from and write to database tables.
- Manifest Management: Functions to interact with a manifest table that maps method calls and arguments to specific database tables.
Usage
The module exports various functions from included scripts. Key functionalities include getting connections to a MySQL database, closing connections, managing data tables, and interacting with a manifest table.
Exported Functions
get_connection
: Establish a connection to a MySQL database.close_connection
: Close an established connection to a MySQL database.get_table
: Retrieve data from a specific table based on method name and arguments.write_table
: Write data to a table and record its details in the manifest.overwrite_table
: Overwrite data in a table and update the manifest.
Dependencies
MySQL.jl
: For MySQL database connectivity and operations.JSON.jl
: For JSON parsing and serialization.DataFrames.jl
: For handling data in tabular form.
FACT_unified_data_IO.SQL_RANGES
— ConstantSQL_RANGES
A constant dictionary mapping Julia data types to tuples representing the minimum and maximum values allowed for each type, intended to align with the range limits of corresponding SQL data types.
This mapping ensures data being prepared for SQL operations does not exceed the storage capabilities of the target database's data types, helping prevent runtime errors due to type overflows or underflows.
Contents
The dictionary contains mappings for the following Julia types:
Int8
,Int16
,Int32
,Int64
: Signed integer types with corresponding SQL range limits.UInt8
,UInt16
,UInt32
,UInt64
: Unsigned integer types with corresponding SQL range limits.Float32
,Float64
: Floating-point types with approximate range limits to accommodate the range of floating-point numbers in SQL.
Usage Example
To clamp a value to the allowed range for its type before inserting it into a database:
value = clamp(value, SQLRANGES[typeof(value)][1], SQLRANGES[typeof(value)][2])
Notes
- The ranges for
Float32
andFloat64
are approximations and may vary depending on specific database implementations or configurations. - This dictionary should be used as a guide for data preprocessing steps, ensuring compatibility with SQL database types and preventing insertion errors.
FACT_unified_data_IO.SQLTYPES
— ConstantSQLTYPES::Dict{Type, String}
A dictionary mapping from Julia data types to SQL data types, used for generating SQL table schemas that correspond to the structure of a Julia DataFrame. This mapping ensures that data stored in a DataFrame can be appropriately represented in a SQL database.
Supported Mappings
Int8
toTINYINT
Int16
toSMALLINT
Int32
toINTEGER
Int64
toBIGINT
UInt8
toTINYINT UNSIGNED
UInt16
toSMALLINT UNSIGNED
UInt32
toINTEGER UNSIGNED
UInt64
toBIGINT UNSIGNED
Float32
toFLOAT
Float64
toDOUBLE
Bool
toBOOL
Vector{UInt8}
toBLOB
(for storing binary data)String
toVARCHAR(255)
(default maximum length for strings)Date
toDATE
Time
toTIME
DateTime
toDATETIME
The dictionary is used by the create_table_from_dataframe
function to automatically determine the SQL column types based on the DataFrame's column types.