Glossary

Glossary of Java-related Business Intelligence Terminology

Analytics: The discovery of patterns in data by combining statistics, computer programming, and operations research, and the communication of insights through. data visualization. Key players include SAS (particularly statistics for medical research), Oracle Hyperion, IBM Cognos.
Business Intelligence (BI): The practical transformation of raw data into actionable information. BI typically involves reporting, online analytical processing, analytics, data and text mining, complex event processing, business performance management, text mining, and analytics (both predictive and prescriptive). For example, a BI suite might combine ETL from many data stores with dashboards, reports, and analytics of a business' data, such as sales by region over time. Players include Jaspersoft, SAP Business Objects, IBM Cognos, Oracle's BI Platform, Microstrategy, and, for business planning, Anaplan.
Cassandra: Apache Cassandra is a high performance (non-normalized) database that is often deployed on distributed clusters.
CDN: Content Delivery Network, such as Windows Azure, Amazon CloudFront, often run by telcos, such as AT&T, to efficiently deliver video across the internet.
cluster computing: Typically a set of inexpensive computers operating as if they were a single super-computer through LAN-based distributed computing orchestrated by a middleware layer with a "master".
column-oriented database: Column-oriented storage layouts are well-suited for OLAP / data warehouses, which typically involve a small number of highly complex queries over terabytes of data. The alternative is row-oriented storage, which is well-suited for OLTP interactions.
CrossTab Report: A report that aids in the analysis of data by combining the summaries of multiple variables. For example, it might reveal that most of the purchasers of a certain product live in a particular region or belong to a certain age group. See also pivot table.
CRM: Customer Relationship Management. A database for CRM is typically read-optimized (rather than write-optimized).
CRUD: create, read, update, delete
Derby: Apache Derby is an OLTP database.
Domain: A metadata layer than can be used for an ad hoc report. For example, a domain might represent a join between two tables.
ETL: Extract, transact, load. ETL is typically a step in creating a unified data warehouse from multiple operational data stores, such as Marketing, Sales, Enterprise Resource Planning (ERP).
Flume: Apache Flume is a distributed system for collecting, aggregating, and moving large amounts of log data from multiple sources to a central data store. Flume can be used with Apache Avro, a framework for remote procedure calls (RPCs) and serialization, that is part of Apache Hadoop and uses JSON for defining data types and protocols.
grid computing: loosely coupled, distributed computers that perform a task. Less centralized than cluster computing or peer-to-peer computing.
Hadoop: Apache Hadoop is fundamental to Big Data because it provides a way for commodity hardware in fault-tolerant, distributed clusters to store and process vast amounts of data. Hadoop involves the combination of the HDFS and MapReduce. To facilitate MapReduce projects, Hadoop users can control query execution using the "Pig Latin" language of the Apache Pig project. Apache HBase is a non-relational database for sparse data that MapReduce processes.
Hadoop-Hive data source: A big data source for fast writes. Queries typically run overnight. The query language is HiveQL.
HDFS: Hadoop Distributed File System, which holds the data the MapReduce processes.
Hibernate: Object-relational Mapping (ORM) that handles the impedance mismatch Java classes and RDBMS tables.
IMBD: In-Memory Database, which is faster than a database that has to seek data on a disk.
Jaspersoft Studio: Eclipse-based tool for designing reports.
JDBC: Java Database Connectivity. JDBC enables a Java application to connect to a data source. The API consists of two packages java.sql and javax.sql. The data source can be a relational database management system (RDBMS) or an non-relational ODBC-aware data source. The database driver is in the form of a .jar file.
For connecting to the database, Oracle recommends using the javax.sql package, which provides javax.sql.DataSource interface. See http://docs.oracle.com/javase/8/docs/api/javax/sql/DataSource.html.
The executeQuery method returns a ResultSet object. package, which enables queries with the Statement.execute method and processing the "hits" with an implementation of the ResultSet interface.
JDBC supports four types of connections, with Type 4 being platform-independent pure Java and using the protocol of the database itself.
Jersey: An open-source framework for developing web services that uses Jackson to serialize/deserialize POJOs to JSON.
Jackson: Jackson is the default provider of serialization/deserialization for Jersey.

JNDI: For production scenarios, it is best to use JDBC with the Java Naming and Directory Interface (JDNI), a directory service which enables a web server and serlet container, such as Apache Tomcat, to manage a "pool" of already-created connections for improved performance. JNDI also supports distributed transactions involving multiple data sources. See http://docs.oracle.com/javase/8/docs/api/javax/sql/DataSource.html.
JSON: JSON JavaScript Object Notation. Translates serialized objects into attribute-value pairs. The most popular format for responses to a RESTful API call because it describes serialized Java objects, also known as plain old Java objects (POJOs), with less overhead than does the XML typically used with SOAP APIs.

The colon is the delimiter that indicates that the value of the firstName is John. The brackets enclose an array of type phoneNumbers.
KPI: Key Performance Indicator metric. For example, customer loyalty, net sales, mean time between failure, graduation rate, or national unemployment.
MapReduce: A scalable, fault-tolerant framework for processing terabytes of data (BigData). The Map function perform a partitioned task. The Reduce function summarizes the results of the partitioned tasks. The fault-tolerance is useful if a large cluster of computers needs a long time to process the work. One use case: Google's indexing of the entire World Wide Web.
Masboard: Dashboard with external content, such as a news feed.
Maven: A build automation tool for building Java projects, running JUnit tests, generating documentation, and packaging the build as a .jar file. POM.xml configures the Project Object Model (POM), which specifies things like the version number of Java to use, whether to run a web server, and can include dependencies like Hibernate. Apache Maven has replaced Apache Ant in popularity because Maven: allows the organization to use plug-ins with minimal configuration; tends to impose a shared convention of doing build tasks; can be run from within an IDE.
MDX: Multidimensional Expressions (MDX) is a query language for OLAP databases, much like SQL is a query language for relational databases. The XML wrapper is called mdXML. It is also a calculation language, with syntax similar to spreadsheet formulas. For example:
    SELECT
    { [Measures].[Store Sales] } ON COLUMNS,
    { [Date].[2002], [Date].[2003] } ON ROWS
    FROM Sales
    WHERE ( [Store].[USA].[CA] )
Measure: For reports, something similar to a field but represents an expression, such as the average freight.
Mongo DB: The most popular NoSQL database system. JSON-like documents are managed with support for concurrency by the use of sharding.
OAuth: An open standard for authorization that allows web surfers to log into third party web sites using their Google password, without exposing that password to the third party.
OLAP: Online Analytical Processing. One wide table with many columns: customer, product, year, order. This is optimal for reading (fast access) of summarized information (year or quarter). Efficient for big reads. Non-normalized. The customer name repeats for each order. See http://db.lcs.mit.edu/projects/cstore/vldb.pdf
OLTP: Online transaction processing system, typically with tables like Customer, Order, Product. Insert, Update, Delete. To get business intelligence requires table JOINs. Efficient for small, frequent write operations because a single operations writes all the fields of a row (tuple). Compare to OLAP.
OpenStack.
OSGi: Open Source Gateway initiative. A Java-based framework for "bundles" of functionality, delivered in .JAR files, to work together as components. Used in the Business Intelligence and Reporting Tools (BIRT) open source reporting engine and in the plug-in architecture for the Eclipse, Confluence Wiki, and the Jira bug tracker.
pivot table: A table of summaries. Given a spreadsheet, a pivot table summarizes a two-dimensional spreadsheet (columns and rows) into a desired third dimension, such as which Person sold which Product in which Region. Insofar as Excel supports pivot tables, Excel is a tool for analytics.
POJO: Plain Old Java Object
POM: Project Object Model represented by the pom.xml of Maven
RAML: RESTful API Modeling Language - http://raml.org/projects.html
RESTful API: Works with the Web’s HTTP protocol "methods" for enable lightweight CRUD: post (create), get (read), head (read metadata), patch (update), delete. HTTP "methods" are invokes through URIs. Supports the representation of object state through JSON (as well as XML).
shard: A shard is a horizontal partition of a database table. For example, in the Customer table, the rows can be sharded by geographic region. Performance improves if each shard has a relatively small index.
Spotfire: Tibco Spotfire is a data visualization tool and also the name of its analytics platform.
Spring: A framework that supports JDBC and can be an alternative to EJB.
Thrift: Apache Thrift is an interface definition language that is used to define and create services for a wide variety of programming and scripting languages.
Virtual datasource: Combines tables from different tables