Derby DB

At times it is more convenient to use a simple embedded database then to setup and configure a DBMS. For instance, one can use an embedded DB to store parameters or test results for quick and easy analysis. In this post, I will provide a simple example of using Apache Derby/JavaDB (JavaDB is Oracle’s version of Apache Derby).

Apache Derby

Apache Derby is a Java implementation of a DBMS. While it is not nearly as comprehensive as mainstream DBMS’s (e.g., MySQL, Oracle, MS SQL Server, or PostgreSQL), it has a fair amount of features that more than covers the vast majority of general purpose database use.

An Apache Software Foundation project, Derby is a series of jar files that can be downloaded here. Select the highest numerical entry from “Latest Official Releases” – e.g., 10.12.1.1 as of this posting. If you simply plan on implementing the database, then download the db-derby-<version>-lib.zip distribution. You will need the following jars (found in the “lib” directory of the download) for your project: derby.jar, derbyclient.jar, and derbynet.jar.

Note: if you are using NetBeans, you can add them directly to your project: expand the project, right-click “Libraries”, select “Add Library…”, and add “Java DB Driver” – thus, no download required. Shortcuts like this might exist for other IDE’s, but I’m only really familiar with NetBeans.

Derby Class

The following is a simple class designed to interface with Derby at the embedded level. As with all JDBC-based connections, you are required to have the following elements: driver path, JDBC connection protocol, username, and password. If the database does not exist and you want to create it, then append create=true to the connection string. This is done automatically by the connect method on lines 77-99. It first tries to connect as if the database existed (lines 79-83), if it does not (line 86), then it will try to create it (lines 87-91).

Other basic features include executing queries that do (executeQuery lines 136-145) and do not (execute lines 121-129) return ResultSets expanded aggregate functions (lines 147-192 – more details to follow), and formatted CLI printing (lines 194-321).

Derby.java

/*
 * Copyright (c) 2015 Ray Hylock
 * Department of Health Services and Information Management
 * College of Allied Health Sciences
 * East Carolina University
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *      http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
import static java.lang.System.err;
import static java.lang.System.out;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.ResultSetMetaData;
import java.sql.SQLException;
import java.text.DecimalFormat;
import java.util.HashSet;
 
/**
 * Basic Apache Derby/JavaDB implementation. Allows for the creation of a 
 * database, basic query execution, formatted CLI output, and extended
 * aggregate functions (i.e., population standard deviation and variance, and
 * sample standard deviation and variance).
 * @author Ray Hylock
 */
public class Derby {
    // derby connection variables
    private static final String DRIVER = "org.apache.derby.jdbc.EmbeddedDriver";
    private static final String JDBC = "jdbc:derby:";
    private static final String CREATE = ";create=true";
    private static final String CONN = ";";
     
    // database variables
    private final String databaseName;
    private Connection conn;
     
    // extended aggregate functions
    private static final String F_VAR_POP = "(CAST(SUM(?*?) AS DOUBLE)/"
            + "CAST(COUNT(?) AS DOUBLE))-(CAST(AVG(?) AS DOUBLE)*"
            + "CAST(AVG(?) AS DOUBLE))";
    private static final String F_VAR_SAMP = "(CAST(COUNT(?) AS DOUBLE)/"
            + "CAST(COUNT(?)-1 AS DOUBLE))*("+F_VAR_POP+")";
    private static final String F_STDEV_POP = "SQRT("+F_VAR_POP+")";
    private static final String F_STDEV_SAMP = "SQRT("+F_VAR_SAMP+")";
     
    /**************************************************************************
     *                          CONSTRUCTOR METHODS                           *
     **************************************************************************/
    /**
     * Instantiates a new {@link Derby} instance using the database name 
     * {@code databaseName}.
     * @param databaseName  database name
     */
    public Derby(String databaseName) {
        this.databaseName = databaseName;
    }
     
    /**************************************************************************
     *                         CONNECT/CLOSE METHODS                          *
     **************************************************************************/
    /**
     * Connect to the database - create if necessary.
     * @param u  username
     * @param p  password
     * @return   {@code true} if database already exists
     */
    public boolean connect(String u, String p){
        // try to connect
        try {
            Class.forName(DRIVER);
            conn = DriverManager.getConnection(JDBC+databaseName+CONN, u, p);
            out.println("Connected to database: " + databaseName);
            return true;
             
        // if it fails, then try to create
        } catch (ClassNotFoundException | SQLException ex) {
            try {
                Class.forName(DRIVER);
                conn = DriverManager.getConnection(JDBC+databaseName+CREATE, u, p);
                out.println("Created database: " + databaseName);
                return false;
            } catch (ClassNotFoundException | SQLException exx) {
                err.println(String.format("Cannot connect to database: %s\n%s", 
                        databaseName, exx.getMessage()));
                System.exit(1);
            }
        }
        return false;
    }
     
    /**
     * Close the database.
     */
    public void close() {
        try {
            conn.close();
        } catch (SQLException ex) {
            err.println(String.format("Failed to close the connection.\n%s",
                    ex.getMessage()));
            System.exit(1);
        }
    }
     
    /**************************************************************************
     *                            EXECUTE METHODS                             *
     **************************************************************************/
    /**
     * Execute a query.
     * @param query query to execute
     */
    public void execute(String query){
        try {
            conn.createStatement().execute(query);
        } catch (SQLException ex) {
            err.println(String.format("Error executing the query: \n%s\n%s", 
                    query, ex.getMessage()));
            System.exit(1);
        }
    }
     
    /**
     * Executes the query and returns the {@link ResultSet} or {@code null}.
     * @param query query to execute
     * @return      {@link ResultSet} or {@code null}
     */
    public ResultSet executeQuery(String query){
        try {
            return conn.createStatement().executeQuery(query);
        } catch (SQLException ex) {
            err.println(String.format("Error executing the query: \n%s\n%s", 
                    query, ex.getMessage()));
            System.exit(1);
        }
        return null;
    }
     
    /**************************************************************************
     *                       EXTENDED AGGREGATE METHODS                       *
     **************************************************************************/
    /**
     * Get the SQL statement to produce a population standard deviation.
     * @param parameter population standard deviation parameter
     * @return formatted SQL statement to produce the population standard
     *         deviation for the parameter
     */
    public String STDEV_POP(String parameter){
        String temp = F_STDEV_POP;
        return temp.replaceAll("\\?", parameter);
    }
     
    /**
     * Get the SQL statement to produce a sample standard deviation.
     * @param parameter sample standard deviation parameter
     * @return formatted SQL statement to produce the sample standard
     *         deviation for the parameter
     */
    public String STDEV_SAMP(String parameter){
        String temp = F_STDEV_SAMP;
        return temp.replaceAll("\\?", parameter);
    }
     
    /**
     * Get the SQL statement to produce a population variance.
     * @param parameter population variance parameter
     * @return formatted SQL statement to produce the population variance for
     *         the parameter
     */
    public String VAR_POP(String parameter){
        String temp = F_VAR_POP;
        return temp.replaceAll("\\?", parameter);
    }
     
    /**
     * Get the SQL statement to produce a sample variance.
     * @param parameter sample variance parameter
     * @return formatted SQL statement to produce the sample variance for
     *         the parameter
     */
    public String VAR_SAMP(String parameter){
        String temp = F_VAR_SAMP;
        return temp.replaceAll("\\?", parameter);
    }
     
    /**************************************************************************
     *                             PRINT METHODS                              *
     **************************************************************************/
    /**
     * Prints the contents of a {@link ResultSet}. All data will be left aligned.
     * @param rs the {@link ResultSet} to print
     * @throws java.sql.SQLException
     */
    public void printResults(ResultSet rs) throws SQLException{
        printResults(rs, false);
    }
 
    /**
     * Prints the contents of a {@link ResultSet}. If {@code autoAlign} is 
     * {@code true}, then it will try to align the text based on its data type
     * (note: the header will always be left aligned).
     * @param rs        the {@link ResultSet} to print
     * @param autoAlign if {@code true} it will try to auto align the 
     *                  output by data type
     * @throws java.sql.SQLException
     */
    public void printResults(ResultSet rs, boolean autoAlign) throws SQLException {
        printResults(rs, autoAlign, 255);
    }
     
    /**
     * Prints the contents of a {@link ResultSet}. If {@code autoAlign} is
     * {@code true}, then it will try to align the text based on its data type
     * (note: the header will always be left aligned). {@code widthReplace}
     * reduces empty white-space for variable-length character fields (e.g.,
     * {@code varchar} and {@code text}).
     * @param rs            the {@link ResultSet} to print
     * @param autoAlign     if {@code true} it will try to auto align the output 
     *                      by data type
     * @param widthReplace  replace variable-length character field widths
     * @throws java.sql.SQLException
     */
    public void printResults(ResultSet rs, boolean autoAlign, 
            int widthReplace) throws SQLException {
        printResults(rs, autoAlign, widthReplace, -1);
    }
     
    /**
     * Prints the contents of a {@link ResultSet}. If {@code autoAlign} is
     * {@code true}, then it will try to align the text based on its data type
     * (note: the header will always be left aligned). {@code widthReplace}
     * reduces empty white-space for variable-length character fields (e.g.,
     * {@code varchar} and {@code text}). {@code scale} sets the number
     * of decimal places to display using {@link DecimalFormat}.
     * @param rs            the {@link ResultSet} to print
     * @param autoAlign     if {@code true} it will try to auto align the output 
     *                      by data type
     * @param widthReplace  replace variable-length character field widths
     * @param scale         number of decimal places
     * @throws java.sql.SQLException
     */
    public void printResults(ResultSet rs, boolean autoAlign, 
            int widthReplace, int scale) throws SQLException {
        // get metadata
        ResultSetMetaData rsmd = rs.getMetaData();
         
        // set scale formatter
        DecimalFormat df = null;
        if(scale > -1){
            String pattern = new String(new char[scale]).replace("\u0000", "0");
            df = new DecimalFormat("#."+pattern);
        }
         
        // right align and floating-point data types
        HashSet<String> right = new HashSet<>();
        HashSet<String> fp = new HashSet<>();
        right.add("double"); right.add("double scale");
        fp.add("double"); fp.add("double scale");
        right.add("float"); right.add("float4");    
        fp.add("float"); fp.add("float4");    
        right.add("int"); right.add("int2"); right.add("int4"); right.add("int8"); 
        right.add("money");     // auto-limited to scale 2 
        right.add("numeric");
        fp.add("numeric");
        right.add("serial"); right.add("serial2"); right.add("serial4");
         
        // column padding
        int padding = 2;
         
        // determine column widths and alignment by data type
        int columns = rsmd.getColumnCount();
        String names[] = new String[columns];
        int widths[] = new int[columns];
        int alignments[] = new int[columns];
        boolean fpB[] = new boolean[columns];
        int totalWidth = 0;
         
        // iterate over metadata: JDBC is 1-indexed
        for (int c = 1; c <= columns; c++) {
            names[c-1] = rsmd.getColumnName(c);
            String dt = rsmd.getColumnTypeName(c).toLowerCase();
            if(dt.equals("varchar")){
                widths[c-1] = widthReplace;
                alignments[c-1] = -1;
            } else {
                int nameLen = names[c-1].length();
                int len = rsmd.getPrecision(c) + rsmd.getScale(c);
                widths[c-1] = ((nameLen > len) ? nameLen : len) + padding;
                alignments[c-1] = (autoAlign && right.contains(dt)) ? 1 : -1;
            }
            totalWidth += widths[c-1];
            fpB[c-1] = fp.contains(dt);
        }
         
        // print header 
        for(int i=0; i<names.length; i++) 
            out.format("%-"+widths[i]+"s", names[i]);        
         
        // print carriage return and line separator
        out.println("\n"+(new String(new char[totalWidth])).replace("\u0000", "-"));
         
        // print data
        while(rs.next()){
            // print data: JDBC is 1-indexed
            for (int r = 1; r <= rsmd.getColumnCount(); r++) 
                out.format("%"+(alignments[r-1]*widths[r-1])+"s", 
                    (scale > -1 && (fpB[r-1]) ? 
                        String.valueOf(df.format(Double.parseDouble(rs.getString(r)))) :
                        rs.getString(r)));
            out.println();   // print carriage return
        }
    }
}
Extended Aggregates

Derby is limited to the basic five aggregate functions: average, count, max, min, and sum. However, one can use these to compute basic measures of spread. Here, we briefly discuss the addition of population/sample variance/standard deviation.

As shown in the post on distributed statistics, variance – and thus standard deviation – can be computed from simple inline and aggregate measures. Converted to simple SELECT statement elements, the definitions can be found on lines 48-54 of Derby.java. Using these functions is a simple as calling their respective methods on lines 147-192, which replace the “?” with the parameter.

DerbyTest Class

The following is a simple test class to illustrate the features of Derby.java.

DerbyTest.java

import java.sql.SQLException;
 
/**
 * Test class for {@link Derby}.
 * @author Ray Hylock
 */
public class DerbyTest {
    // result set print formatting parameters
    private static final boolean AUTO_ALIGN = false;
    private static final int WIDTH_REPLACE = 15;
    private static final int SCALE= 6;
     
    /**
     * Main method for this test
     * @param args does not take arguments
     * @throws SQLException 
     */
    public static void main(String[] args) throws SQLException {
        // instantiate database and connect
        Derby derby = new Derby("derby_db");
        boolean exists = derby.connect("user", "pass");
         
        // create adatabase if it doesn't already exist
        if(!exists) create(derby);
         
        // print all data 
        printAll(derby);
         
        // print aggregates using built-in functions
        aggregateBuiltIn(derby);
         
        // print extended aggregates
        aggregateExtension(derby);
         
        // close
        derby.close();
    }
     
    /**
     * Creates the database and populates it with sample data.
     * @param derby the {@link Derby} object
     * @throws SQLException 
     */
    private static void create(Derby derby) throws SQLException{
        // create a simple table with serial id
        String create = "CREATE TABLE test ("
                + " serial_id   int primary key generated always as identity "
                +                            "(start with 1, increment by 1),"
                + " measure     double"
                + ")";
        derby.execute(create);
         
        // add some data
        double[] v = {96.2, 54.4, 45.5, 91.2};
        for(double d : v) 
            derby.execute("INSERT INTO test (measure) VALUES("+d+")");
    }
     
    /**
     * Prints the contents of the sample table.
     * @param derby the {@link Derby} object
     * @throws SQLException 
     */
    private static void printAll(Derby derby) throws SQLException{
        String query = "SELECT * FROM test";
        System.out.println("\nQuery: "+query);
        derby.printResults(derby.executeQuery(query), 
            AUTO_ALIGN, WIDTH_REPLACE, SCALE);
    }
     
    /**
     * Prints the five built-in aggregate values for the sample attribute.
     * @param derby the {@link Derby} object
     * @throws SQLException 
     */
    private static void aggregateBuiltIn(Derby derby) throws SQLException{
        String query = 
            "SELECT AVG(measure) AS \"AVG\", COUNT(measure) AS \"CNT\", "
                + "MAX(measure) AS \"MAX\", MIN(measure) AS \"MIN\", "
                + "SUM(measure) AS \"SUM\" "
            + "FROM test";
        System.out.println("\nQuery: "+query);
        derby.printResults(derby.executeQuery(query), 
                AUTO_ALIGN, WIDTH_REPLACE, SCALE);
    }
     
    /**
     * Prints the four extended aggregate values for the sample attribute.
     * @param derby the {@link Derby} object
     * @throws SQLException 
     */
    private static void aggregateExtension(Derby derby) throws SQLException{
        String query = 
            "SELECT "+derby.VAR_POP("measure")+" AS \"VARP\", "
                +derby.VAR_SAMP("measure")+" AS \"VARS\", "
                +derby.STDEV_POP("measure")+" AS \"STDEVP\", "
                +derby.STDEV_SAMP("measure")+" AS \"STDEVS\" "
           + "FROM test";
        System.out.println("\nQuery: "+query);
        derby.printResults(derby.executeQuery(query), 
                AUTO_ALIGN, WIDTH_REPLACE, SCALE);
    }
}

The output for DerbyTest.java can be found below. Note: for subsequent runs (i.e., after the database is created), Created database: derby_db becomes Connected to database: derby_db. I’ve highlighted the output using “SQL” so the SQL statements are easier to read.

Created database: derby_db
 
Query: SELECT * FROM test
SERIAL_ID   MEASURE          
-----------------------------
1           96.200000        
2           54.400000        
3           45.500000        
4           91.200000        
 
Query: SELECT AVG(measure) AS "AVG", COUNT(measure) AS "CNT", MAX(measure) AS "MAX", MIN(measure) AS "MIN", SUM(measure) AS "SUM" FROM test
AVG              CNT         MAX              MIN              SUM             
--------------------------------------------------------------------------------
71.825000        4           96.200000        45.500000        287.300000       
 
Query: SELECT (CAST(SUM(measure*measure) AS DOUBLE)/CAST(COUNT(measure) AS DOUBLE))-(CAST(AVG(measure) AS DOUBLE)*CAST(AVG(measure) AS DOUBLE)) AS "VARP", (CAST(COUNT(measure) AS DOUBLE)/CAST(COUNT(measure)-1 AS DOUBLE))*((CAST(SUM(measure*measure) AS DOUBLE)/CAST(COUNT(measure) AS DOUBLE))-(CAST(AVG(measure) AS DOUBLE)*CAST(AVG(measure) AS DOUBLE))) AS "VARS", SQRT((CAST(SUM(measure*measure) AS DOUBLE)/CAST(COUNT(measure) AS DOUBLE))-(CAST(AVG(measure) AS DOUBLE)*CAST(AVG(measure) AS DOUBLE))) AS "STDEVP", SQRT((CAST(COUNT(measure) AS DOUBLE)/CAST(COUNT(measure)-1 AS DOUBLE))*((CAST(SUM(measure*measure) AS DOUBLE)/CAST(COUNT(measure) AS DOUBLE))-(CAST(AVG(measure) AS DOUBLE)*CAST(AVG(measure) AS DOUBLE)))) AS "STDEVS" FROM test
VARP             VARS             STDEVP           STDEVS           
--------------------------------------------------------------------
491.541875       655.389167       22.170744        25.600570