At times it is more convenient to use a simple embedded database then to setup and configure a DBMS. For instance, one can use an embedded DB to store parameters or test results for quick and easy analysis. In this post, I will provide a simple example of using Apache Derby/JavaDB (JavaDB is Oracle’s version of Apache Derby).
Apache Derby
Apache Derby is a Java implementation of a DBMS. While it is not nearly as comprehensive as mainstream DBMS’s (e.g., MySQL, Oracle, MS SQL Server, or PostgreSQL), it has a fair amount of features that more than covers the vast majority of general purpose database use.
An Apache Software Foundation project, Derby is a series of jar files that can be downloaded here. Select the highest numerical entry from “Latest Official Releases” – e.g., 10.12.1.1 as of this posting. If you simply plan on implementing the database, then download the db-derby-<version>-lib.zip
distribution. You will need the following jars (found in the “lib” directory of the download) for your project: derby.jar
, derbyclient.jar
, and derbynet.jar
.
Note: if you are using NetBeans, you can add them directly to your project: expand the project, right-click “Libraries”, select “Add Library…”, and add “Java DB Driver” – thus, no download required. Shortcuts like this might exist for other IDE’s, but I’m only really familiar with NetBeans.
Derby Class
The following is a simple class designed to interface with Derby at the embedded level. As with all JDBC-based connections, you are required to have the following elements: driver path, JDBC connection protocol, username, and password. If the database does not exist and you want to create it, then append create=true
to the connection string. This is done automatically by the connect
method on lines 77-99. It first tries to connect as if the database existed (lines 79-83), if it does not (line 86), then it will try to create it (lines 87-91).
Other basic features include executing queries that do (executeQuery
lines 136-145) and do not (execute
lines 121-129) return ResultSet
s expanded aggregate functions (lines 147-192 – more details to follow), and formatted CLI printing (lines 194-321).
Derby.java
/*
* Copyright (c) 2015 Ray Hylock
* Department of Health Services and Information Management
* College of Allied Health Sciences
* East Carolina University
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import static java.lang.System.err;
import static java.lang.System.out;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.ResultSetMetaData;
import java.sql.SQLException;
import java.text.DecimalFormat;
import java.util.HashSet;
/**
* Basic Apache Derby/JavaDB implementation. Allows for the creation of a
* database, basic query execution, formatted CLI output, and extended
* aggregate functions (i.e., population standard deviation and variance, and
* sample standard deviation and variance).
* @author Ray Hylock
*/
public class Derby {
// derby connection variables
private static final String DRIVER = "org.apache.derby.jdbc.EmbeddedDriver";
private static final String JDBC = "jdbc:derby:";
private static final String CREATE = ";create=true";
private static final String CONN = ";";
// database variables
private final String databaseName;
private Connection conn;
// extended aggregate functions
private static final String F_VAR_POP = "(CAST(SUM(?*?) AS DOUBLE)/"
+ "CAST(COUNT(?) AS DOUBLE))-(CAST(AVG(?) AS DOUBLE)*"
+ "CAST(AVG(?) AS DOUBLE))";
private static final String F_VAR_SAMP = "(CAST(COUNT(?) AS DOUBLE)/"
+ "CAST(COUNT(?)-1 AS DOUBLE))*("+F_VAR_POP+")";
private static final String F_STDEV_POP = "SQRT("+F_VAR_POP+")";
private static final String F_STDEV_SAMP = "SQRT("+F_VAR_SAMP+")";
/**************************************************************************
* CONSTRUCTOR METHODS *
**************************************************************************/
/**
* Instantiates a new {@link Derby} instance using the database name
* {@code databaseName}.
* @param databaseName database name
*/
public Derby(String databaseName) {
this.databaseName = databaseName;
}
/**************************************************************************
* CONNECT/CLOSE METHODS *
**************************************************************************/
/**
* Connect to the database - create if necessary.
* @param u username
* @param p password
* @return {@code true} if database already exists
*/
public boolean connect(String u, String p){
// try to connect
try {
Class.forName(DRIVER);
conn = DriverManager.getConnection(JDBC+databaseName+CONN, u, p);
out.println("Connected to database: " + databaseName);
return true;
// if it fails, then try to create
} catch (ClassNotFoundException | SQLException ex) {
try {
Class.forName(DRIVER);
conn = DriverManager.getConnection(JDBC+databaseName+CREATE, u, p);
out.println("Created database: " + databaseName);
return false;
} catch (ClassNotFoundException | SQLException exx) {
err.println(String.format("Cannot connect to database: %s\n%s",
databaseName, exx.getMessage()));
System.exit(1);
}
}
return false;
}
/**
* Close the database.
*/
public void close() {
try {
conn.close();
} catch (SQLException ex) {
err.println(String.format("Failed to close the connection.\n%s",
ex.getMessage()));
System.exit(1);
}
}
/**************************************************************************
* EXECUTE METHODS *
**************************************************************************/
/**
* Execute a query.
* @param query query to execute
*/
public void execute(String query){
try {
conn.createStatement().execute(query);
} catch (SQLException ex) {
err.println(String.format("Error executing the query: \n%s\n%s",
query, ex.getMessage()));
System.exit(1);
}
}
/**
* Executes the query and returns the {@link ResultSet} or {@code null}.
* @param query query to execute
* @return {@link ResultSet} or {@code null}
*/
public ResultSet executeQuery(String query){
try {
return conn.createStatement().executeQuery(query);
} catch (SQLException ex) {
err.println(String.format("Error executing the query: \n%s\n%s",
query, ex.getMessage()));
System.exit(1);
}
return null;
}
/**************************************************************************
* EXTENDED AGGREGATE METHODS *
**************************************************************************/
/**
* Get the SQL statement to produce a population standard deviation.
* @param parameter population standard deviation parameter
* @return formatted SQL statement to produce the population standard
* deviation for the parameter
*/
public String STDEV_POP(String parameter){
String temp = F_STDEV_POP;
return temp.replaceAll("\\?", parameter);
}
/**
* Get the SQL statement to produce a sample standard deviation.
* @param parameter sample standard deviation parameter
* @return formatted SQL statement to produce the sample standard
* deviation for the parameter
*/
public String STDEV_SAMP(String parameter){
String temp = F_STDEV_SAMP;
return temp.replaceAll("\\?", parameter);
}
/**
* Get the SQL statement to produce a population variance.
* @param parameter population variance parameter
* @return formatted SQL statement to produce the population variance for
* the parameter
*/
public String VAR_POP(String parameter){
String temp = F_VAR_POP;
return temp.replaceAll("\\?", parameter);
}
/**
* Get the SQL statement to produce a sample variance.
* @param parameter sample variance parameter
* @return formatted SQL statement to produce the sample variance for
* the parameter
*/
public String VAR_SAMP(String parameter){
String temp = F_VAR_SAMP;
return temp.replaceAll("\\?", parameter);
}
/**************************************************************************
* PRINT METHODS *
**************************************************************************/
/**
* Prints the contents of a {@link ResultSet}. All data will be left aligned.
* @param rs the {@link ResultSet} to print
* @throws java.sql.SQLException
*/
public void printResults(ResultSet rs) throws SQLException{
printResults(rs, false);
}
/**
* Prints the contents of a {@link ResultSet}. If {@code autoAlign} is
* {@code true}, then it will try to align the text based on its data type
* (note: the header will always be left aligned).
* @param rs the {@link ResultSet} to print
* @param autoAlign if {@code true} it will try to auto align the
* output by data type
* @throws java.sql.SQLException
*/
public void printResults(ResultSet rs, boolean autoAlign) throws SQLException {
printResults(rs, autoAlign, 255);
}
/**
* Prints the contents of a {@link ResultSet}. If {@code autoAlign} is
* {@code true}, then it will try to align the text based on its data type
* (note: the header will always be left aligned). {@code widthReplace}
* reduces empty white-space for variable-length character fields (e.g.,
* {@code varchar} and {@code text}).
* @param rs the {@link ResultSet} to print
* @param autoAlign if {@code true} it will try to auto align the output
* by data type
* @param widthReplace replace variable-length character field widths
* @throws java.sql.SQLException
*/
public void printResults(ResultSet rs, boolean autoAlign,
int widthReplace) throws SQLException {
printResults(rs, autoAlign, widthReplace, -1);
}
/**
* Prints the contents of a {@link ResultSet}. If {@code autoAlign} is
* {@code true}, then it will try to align the text based on its data type
* (note: the header will always be left aligned). {@code widthReplace}
* reduces empty white-space for variable-length character fields (e.g.,
* {@code varchar} and {@code text}). {@code scale} sets the number
* of decimal places to display using {@link DecimalFormat}.
* @param rs the {@link ResultSet} to print
* @param autoAlign if {@code true} it will try to auto align the output
* by data type
* @param widthReplace replace variable-length character field widths
* @param scale number of decimal places
* @throws java.sql.SQLException
*/
public void printResults(ResultSet rs, boolean autoAlign,
int widthReplace, int scale) throws SQLException {
// get metadata
ResultSetMetaData rsmd = rs.getMetaData();
// set scale formatter
DecimalFormat df = null;
if(scale > -1){
String pattern = new String(new char[scale]).replace("\u0000", "0");
df = new DecimalFormat("#."+pattern);
}
// right align and floating-point data types
HashSet<String> right = new HashSet<>();
HashSet<String> fp = new HashSet<>();
right.add("double"); right.add("double scale");
fp.add("double"); fp.add("double scale");
right.add("float"); right.add("float4");
fp.add("float"); fp.add("float4");
right.add("int"); right.add("int2"); right.add("int4"); right.add("int8");
right.add("money"); // auto-limited to scale 2
right.add("numeric");
fp.add("numeric");
right.add("serial"); right.add("serial2"); right.add("serial4");
// column padding
int padding = 2;
// determine column widths and alignment by data type
int columns = rsmd.getColumnCount();
String names[] = new String[columns];
int widths[] = new int[columns];
int alignments[] = new int[columns];
boolean fpB[] = new boolean[columns];
int totalWidth = 0;
// iterate over metadata: JDBC is 1-indexed
for (int c = 1; c <= columns; c++) {
names[c-1] = rsmd.getColumnName(c);
String dt = rsmd.getColumnTypeName(c).toLowerCase();
if(dt.equals("varchar")){
widths[c-1] = widthReplace;
alignments[c-1] = -1;
} else {
int nameLen = names[c-1].length();
int len = rsmd.getPrecision(c) + rsmd.getScale(c);
widths[c-1] = ((nameLen > len) ? nameLen : len) + padding;
alignments[c-1] = (autoAlign && right.contains(dt)) ? 1 : -1;
}
totalWidth += widths[c-1];
fpB[c-1] = fp.contains(dt);
}
// print header
for(int i=0; i<names.length; i++)
out.format("%-"+widths[i]+"s", names[i]);
// print carriage return and line separator
out.println("\n"+(new String(new char[totalWidth])).replace("\u0000", "-"));
// print data
while(rs.next()){
// print data: JDBC is 1-indexed
for (int r = 1; r <= rsmd.getColumnCount(); r++)
out.format("%"+(alignments[r-1]*widths[r-1])+"s",
(scale > -1 && (fpB[r-1]) ?
String.valueOf(df.format(Double.parseDouble(rs.getString(r)))) :
rs.getString(r)));
out.println(); // print carriage return
}
}
}
Extended Aggregates
Derby is limited to the basic five aggregate functions: average, count, max, min, and sum. However, one can use these to compute basic measures of spread. Here, we briefly discuss the addition of population/sample variance/standard deviation.
As shown in the post on distributed statistics, variance – and thus standard deviation – can be computed from simple inline and aggregate measures. Converted to simple SELECT
statement elements, the definitions can be found on lines 48-54 of Derby.java
. Using these functions is a simple as calling their respective methods on lines 147-192, which replace the “?” with the parameter.
DerbyTest Class
The following is a simple test class to illustrate the features of Derby.java
.
DerbyTest.java
import java.sql.SQLException;
/**
* Test class for {@link Derby}.
* @author Ray Hylock
*/
public class DerbyTest {
// result set print formatting parameters
private static final boolean AUTO_ALIGN = false;
private static final int WIDTH_REPLACE = 15;
private static final int SCALE= 6;
/**
* Main method for this test
* @param args does not take arguments
* @throws SQLException
*/
public static void main(String[] args) throws SQLException {
// instantiate database and connect
Derby derby = new Derby("derby_db");
boolean exists = derby.connect("user", "pass");
// create adatabase if it doesn't already exist
if(!exists) create(derby);
// print all data
printAll(derby);
// print aggregates using built-in functions
aggregateBuiltIn(derby);
// print extended aggregates
aggregateExtension(derby);
// close
derby.close();
}
/**
* Creates the database and populates it with sample data.
* @param derby the {@link Derby} object
* @throws SQLException
*/
private static void create(Derby derby) throws SQLException{
// create a simple table with serial id
String create = "CREATE TABLE test ("
+ " serial_id int primary key generated always as identity "
+ "(start with 1, increment by 1),"
+ " measure double"
+ ")";
derby.execute(create);
// add some data
double[] v = {96.2, 54.4, 45.5, 91.2};
for(double d : v)
derby.execute("INSERT INTO test (measure) VALUES("+d+")");
}
/**
* Prints the contents of the sample table.
* @param derby the {@link Derby} object
* @throws SQLException
*/
private static void printAll(Derby derby) throws SQLException{
String query = "SELECT * FROM test";
System.out.println("\nQuery: "+query);
derby.printResults(derby.executeQuery(query),
AUTO_ALIGN, WIDTH_REPLACE, SCALE);
}
/**
* Prints the five built-in aggregate values for the sample attribute.
* @param derby the {@link Derby} object
* @throws SQLException
*/
private static void aggregateBuiltIn(Derby derby) throws SQLException{
String query =
"SELECT AVG(measure) AS \"AVG\", COUNT(measure) AS \"CNT\", "
+ "MAX(measure) AS \"MAX\", MIN(measure) AS \"MIN\", "
+ "SUM(measure) AS \"SUM\" "
+ "FROM test";
System.out.println("\nQuery: "+query);
derby.printResults(derby.executeQuery(query),
AUTO_ALIGN, WIDTH_REPLACE, SCALE);
}
/**
* Prints the four extended aggregate values for the sample attribute.
* @param derby the {@link Derby} object
* @throws SQLException
*/
private static void aggregateExtension(Derby derby) throws SQLException{
String query =
"SELECT "+derby.VAR_POP("measure")+" AS \"VARP\", "
+derby.VAR_SAMP("measure")+" AS \"VARS\", "
+derby.STDEV_POP("measure")+" AS \"STDEVP\", "
+derby.STDEV_SAMP("measure")+" AS \"STDEVS\" "
+ "FROM test";
System.out.println("\nQuery: "+query);
derby.printResults(derby.executeQuery(query),
AUTO_ALIGN, WIDTH_REPLACE, SCALE);
}
}
The output for DerbyTest.java
can be found below. Note: for subsequent runs (i.e., after the database is created), Created database: derby_db
becomes Connected to database: derby_db
. I’ve highlighted the output using “SQL” so the SQL statements are easier to read.
Created database: derby_db
Query: SELECT * FROM test
SERIAL_ID MEASURE
-----------------------------
1 96.200000
2 54.400000
3 45.500000
4 91.200000
Query: SELECT AVG(measure) AS "AVG", COUNT(measure) AS "CNT", MAX(measure) AS "MAX", MIN(measure) AS "MIN", SUM(measure) AS "SUM" FROM test
AVG CNT MAX MIN SUM
--------------------------------------------------------------------------------
71.825000 4 96.200000 45.500000 287.300000
Query: SELECT (CAST(SUM(measure*measure) AS DOUBLE)/CAST(COUNT(measure) AS DOUBLE))-(CAST(AVG(measure) AS DOUBLE)*CAST(AVG(measure) AS DOUBLE)) AS "VARP", (CAST(COUNT(measure) AS DOUBLE)/CAST(COUNT(measure)-1 AS DOUBLE))*((CAST(SUM(measure*measure) AS DOUBLE)/CAST(COUNT(measure) AS DOUBLE))-(CAST(AVG(measure) AS DOUBLE)*CAST(AVG(measure) AS DOUBLE))) AS "VARS", SQRT((CAST(SUM(measure*measure) AS DOUBLE)/CAST(COUNT(measure) AS DOUBLE))-(CAST(AVG(measure) AS DOUBLE)*CAST(AVG(measure) AS DOUBLE))) AS "STDEVP", SQRT((CAST(COUNT(measure) AS DOUBLE)/CAST(COUNT(measure)-1 AS DOUBLE))*((CAST(SUM(measure*measure) AS DOUBLE)/CAST(COUNT(measure) AS DOUBLE))-(CAST(AVG(measure) AS DOUBLE)*CAST(AVG(measure) AS DOUBLE)))) AS "STDEVS" FROM test
VARP VARS STDEVP STDEVS
--------------------------------------------------------------------
491.541875 655.389167 22.170744 25.600570