View on GitHub

Netezza

# Matrix correlation computation using Netezza UDXs: study of computability and scalability.

Download this project as a .zip file Download this project as a tar.gz file

This is an implementation of a simple matrix correlation computation function in Netezza that takes as input a matrix with raw data (e.g. Variables as columns and Observations as rows). There are many ways to compute correlation. In this project Pearson's coefficient of correlation is used.

The code contains :

  1. C++ files that conatain UDxs to achieve the functionality.

  2. Scripts to compile and register these UDx on the Netezza database.

  3. Scripts to register stored procedures to automate the tests.

  4. Scripts to drop the registered UDxs and stored procedures.

Please follow the following steps to test the code.

Step 1: Compilation and Registration of UDxs

  1. Please find the bash script file named "script", which contains the commands for compilation and registration of UDxs.
  2. Run that script.
  3. Make sure the source files are in the same directory in which the bash script is being executed and the executable bit of the script is set.

Step 2: Creating Stored procedures

  1. File named "create_procedure" creates a stored procedure using NZ PL/SQL
  2. This procedures contains the whole process of creating a data matrix, generating correlation matrix for the same, extracting the top ten pairs and also prints the times for intermediate steps using raise notice
  3. This just registers the procedure, we have to call that procedure in the next step. run as : nzsql -f create_procedure;
  4. There is another file "test" this file contains a procedure with verification of the output from UDx and output from netezza's inbuilt correlation matrix generation function.
  5. If the output at the end of the test is 0 rows that implies the output of UDx is correct.
  6. It works by using "minus" operator that outputs values different in two tables. -> run as : nzsql -f test

Step 3: Running tests

  1. File named "run_test" contains a simple call to the above store procedure.
  2. Please change the input values in this file.This has to be done manually.
  3. Execution of the below command does all the mentioned steps of creating input matrix, correlation matrix and extracting top 10 pairs. -> run as: nzsql -f run_test;

Step 4: Purging

  1. Two files are provided for this purpose 1) purge_tables 2) purge_procedures.
  2. "purge_tables" will delete the tables created when the test is run.
  3. "purge_procedures" will delete/drop all the UDxs and procedures used for the purpose of testing.