No SQL

by Administrator 30. August 2013 11:40
Some thoughts on NoSQL
No SQL

History

- Term coined in 1998, Mostly originated out of a Web 2.0 need
- Mostly re-packaged \ re-branded existing technology
- Used in one way or another by:
- Facebook, Twitter, Digg, Amazon, LinkedIn, Google, Microsoft

No SQL

Is (generally)

- For interconnected data (links, pingbacks, tags etc..)
- For hierarchical nested data
- Designed for horizontal scale (scale out, distributed)
- For leveraging commodity hardware
- Open Source
- Designed to complement traditional SQL

Is Not (generally)

- SQL
- Relational
- A replacement for relational
- Turn key

No SQL Design Goals

Structured storage and retrieval of data
- Un-structured data storage, e.g. text files, is generally not scaleable
Looser consistency model than relational

- Data corruption may occur compared to 2-phase commit depending on implementation
- May not be appropriate where consistency is required, e.g. financial

Motivations (not necessarily justifications)

- Design simplicity
- Horizontal scaling
- Availability control
- Optimized for read
WW compressed data currently at 1K ExaBytes

- One thousand billion gigabytes
- Scalability and consistency now equal in priority

Common Related Architecture Patterns

Map / Reduce
- Mapping data across a distributed cluster
- Reduce data to a summary for output
- Very hard to avoid contention at consolidation points
- Scale out not well supported in traditional SQL databases
REST (Representational State Transfer)
- Design pattern which maps a standard set of verbs to resources \ methods
- Typically HTTP : GET, POST, PUT, DELETE
- Simpler than SOAP, which requires custom implementation on both client and server
Thrift (Apache)
- Cross language service development
- Software stack + code generation engine
- Used by many No SQL vendors to provide cross language support and RAD
XPath \ XQuery \ XSLT
- XPath - Language for selecting nodes from an XML document
- XQuery \ XSLT Languages for transforming XML to other formats, e.g. HTML
- General default data format
No SQL Database Types

Column Store
- Entire column indexed, serialized and compressed
Document Store
- BLOB storage of document as object
Key Value
- Schema-less name value pair storage
Graph Databases
-Index free adjacency of directly related nodes
Multimodel Databases
- Designed to support multiple data models and use-cases
Object Databases
- Data stored and used as objects
Cloud Databases
- Deployment model in cloud, can be SQL or No SQL
XML Databases
- Data stored as XML
Multidimensional Databases
- Relational data aggregated across dimensions
MultiValue Databases
- Relational with a focus on list type parameters

Column Store

All the values of the column are serialized together
Leverages locality of reference \ proximity
Columns are sparse, no data = 0 bytes
When combined with dynamic compression this can result in very fast data retrieval, e.g. find all employees with name = ‘smith’
Efficiency vs. row store depends on the workload
Can be less efficient than traditional row store when all columns are returned
Compression leverages repeating bytes, and common column data type
Compare SQL Server Column Store index type
Examples: Hadoop, BigTable, Cassandra, Hypertable

Document Store

Designed for document-oriented information
Aka semi-structured data
Main object is the ‘document’
Readable encodings
XML (hierarchical tags)
YAML (indentation based)
JSON (key \ value pairs)
Binary encodings
BSON (binary JSON)
PDF
MS Office
Compare SQL Server FileTable table type
The next great DW challenge
Examples: MongoDB, Elasticsearch, CouchDB

Key Value

Aka Associative Array
Schema-less storage, reduced code impact of changes
Operations: pair add, pair delete, pair update, key lookup
Traditional implementations
Hash table (using hash function to map keys to values, and handle collisions)
Binary search trees (size, depth, root id, leave ids)
B-Trees (balanced tree, keeps data sorted), e.g. SQL Server index
Compare to SQL Server XML data type column (sort of)
Examples: DynamoDB, Azure Table Storage, LevelDB

Graph Databases

Use graph structure with nodes, edges, and properties to represent and store data
Any storage system that provides index-free adjacency
Each node contains a direct pointer to it’s adjacent node
Edges are the connections between nodes
Most important information in contained in the edges
Examination of edges, nodes, and properties yield meaningful patterns
Examples: Neo4J, Infinite Graph, InfoGrid, Trinity

Multi-model Databases

Support characteristics of multiple databases
Presents a variety of logical models \ views to the user
Uses industry-standard interfaces, e.g. XML
Schema agnostic
Types
True multi-model databases designed for multiple models and use-cases
Examples: FoundationDB, Aerospike, OrientDB, ArangoDB
General-purpose databases with multi-model options
Examples: MySQL, DB2, Akiban, PostgreSQL

Object Databases

Data stored as objects
Processed using object-oriented programs
Database is integrated with the programming language
Competing design directions
Inheritance like C++
Encapsulation like XML
Example languages: Delphi, Ruby, Python, Perl Java, C#, C++
Examples: Versant, db4o, Objectivity, Starcounter, Perst

Cloud Databases

A distributed deployment model rather than a database type
Can be SQL or No SQL
Could be a VM, or hosted storage
May focus on throughput rather than storage
Example vendors: Amazon EC2, GoGrid, Rackspace, MS SQL Azure
SQL cloud service examples: Amazon (MySQL), MS Azure, Heroku
No SQL cloud service examples: Amazon DynamoDB, Google App Engine, MongoDB

XML Databases

Data stored as XML
Can be verbose
Supports select, export, serialization
Usually associated with Document databases
Types
XML Enabled : converts to and from XML, e.g. SQL Server XML data type
Native XML : uses XML documents as the unit of storage, which may be text files

Multidimensional Databases

Aka OLAP (on-line analytical processing)
Variants of relational databases
Data represented as cubes with N dimensions
Each cell in a cube is an intersection \ aggregate of data along it’s dimensions
Uses relational data as the data source
Types
MOLAP : (M = multi-dimensional), filters and aggregates pre-calculated
ROLAP : (R = relational), filters and aggregates created at run-time

MultiValue Databases

Support and encourage the use of attributes which can take a list of values
Data model pre-dates the relational model
Developed for the PICK OS in 1960, still in use
Fits in No SQL because SQL is not required
Examples: Rocket Software (U2), TigerLogic, jBase, Revelation

Trending

Polyglot persistence can’t persist
Companies must consolidate
No SQL vendors must add SQL and transactions
SQL vendors must add document and key-value support

Tags:

Log in
Privacy Policy Terms and Conditions Copyright Relational Development