Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@ relx
*.html
*.o
*.plt
*.png
*.log
*.swp
erl_crash.dump
Expand Down
38 changes: 35 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,40 @@ elysium [![Build Status](https://travis-ci.org/jaynel/elysium.svg)](https://trav

Elysium is the heavenly afterlife location where Cassandra was rewarded for the suffering she endured on earth. This project will hopefully make life easier for erlang projects which need multiple connections to a Cassandra server. The limited resource of socket connections is maintained in a FIFO queue, so that a connection may only be used by one process at a time. When the task is complete, the resource is added back to the end of the queue of available resources.

The FIFO queue is maintained by epocxy in an ets_buffer (an ets table with numerically indexed keys). The functionality is ensured by a full suite of PropEr tests running under common_test control.
Library Logic
-------------
Elysium provides a supervisor hierarchy to manage itself as an included application. Just plug the `elysium_sup` into your hierarchy.

Internally the supervisor hierarchy is a rest_for_one organization as shown in the [Supervisor Hierarchy Diagram](doc/supervision_tree.pdf). The leftmost children manage data that must always be present, the middle children are the actual implementation and the last supervisor manages dynamic connections to Cassandra. Although this library was written for use with one or more Cassandra clusters using [seestar](https://github.com/iamaleksey/seestar) as the Cassandra connection library, the only Cassandra dependencies are in `elysium_connection` (this module instantiates the actual DB socket connections, explicitly naming seestar) and `elysium_peer_handler` (which uses Cassandra CQL syntax to query the seed node for new Cassandra nodes participating in the cluster). Generalizing the approach to other databases would not be difficult.

The fundamental logic involves maintaining a queue of idle DB connections and a queue of pending requests. A DB request becomes pending only when the queue of idle connections is empty. Normally a call to `elyisum_buffering_strategy:with_connection/4,5` is sufficient to query the DB. This call determines the installed behaviour for the queues, then checks out an idle connection, performs the DB request and checks the connection back in. No other assumptions about the implementation of the queue, the means of checkin/checkout or other characteristics are made, besides that a checked out connection is not idle and can only be used for a single request.

When there are no idle connections, the request is checked in to the pending requests queue. During connection checkin, the pending requests queue is consulted for immediate reuse prior to handing a connection to the idle connection queue. Pending requests leave the clock running for a timeout while they are waiting for an available connection. If the pending request is checked out of its queue, but it has either crashed or the time has run out for the query, the request is discarded and the connection attempts to get another pending request. If max_retries are used and no valid pending_request is found, the connection is checked in as idle without attempting to fetch any more pending requests from the queue.

The two queues are implemented as behaviour instances of `elysium_buffering_strategy`. Both `elysium_bs_parallel` and `elysium_bs_serial` are available as examples, but the parallel implementation relies on a FIFO `ets_buffer` from [epocxy](https://github.com/duomark/epocxy/) which has concurrency bugs. Right now you should stick with the use of `elysium_bs_serial` for both the connection and pending requests queues.

To safely reconfigure the number or location of connections, use `elysium_queue:deactivate/0` followed by `elysium_queue:activate/0` after making changes. These two operations close all active connections and then reopen the connections respectively. The termination is abrupt as the code fetches all `elysium_connection_sup` children and terminates them, so a bit of quiescence is recommended before the restart sequence.

Top-Level Supervisors
---------------------
The following children are maintained as the top-level rest_for_one children of `elysium_sup`:

1. **elysium_buffer_sup**: Purely an ets table owner
1. **elysium_queue**: An API for querying the status and activating/deactivating elysium
1. **elysium_lb_queue**: A round-robin queue of DB hostname/port pairs used for creating new connections
1. **elysium_peer_handler**: A Cassandra seed query mechanism to discover Cassandra server nodes added and removed from the cluster
1. **elysium_session_queue**: A `elysium_serial_queue` of all idle connections to the database
1. **elysium_pending_queue**: An `elysium_serial_queue` of all pending requests when no idle connections exist
1. **elysium_connection_sup**: The only supervisor which manages live connections to the DB cluster

Whenever new connections are needed, they are obtained from the `elysium_lb_queue` by connecting to the next available host in the round-robin queue. After connecting, the host is added to the end of the queue for continuous ring-like action. If the connection to the host times out or has other errors, it is put back at the end of the queue and the next host is tried. This distributes new connections across the DB cluster while skipping those nodes which are not responsive.

The session queue is used to checkin/checkout idle connections. On checkin, there is a random probability (default of 100,000 per 1 Billion attempts) that the connection will "decay". A decayed connection is closed and replaced with a new connection. The purpose of decay is to allow connections to dynamically migrate around the DB cluster so that any staleness, memory fragmentation or other possible latent bugs are expunged before they cause serious damage. If there are problems on the DB cluster, the built-in stochastic jitter should make communications more resilient and balanced across the DB cluster.

Testing
-------
All testing is currently done within the Tigertext application environment. Tests for this library will be added later.

In order to run tests, clone and run make:

$ git clone https://github.com/duomark/elysium
Expand All @@ -19,5 +49,7 @@ To check that the types are properly specified, make the plt and run dialyzer:
$ make dialyze

Travis CI
-----------
Builds are also enabled through travis-ci.org
=========
[Travis-CI](http://about.travis-ci.org/) provides Continuous Integration. Travis automatically builds and runs the unit tests whenever the code is modified. The status of the current build is shown in the image badge at the top of this page.

Integration with Travis is provided by the [.travis.yml file](https://raw.github.com/duomark/erlangsp/master/.travis.yml). The automated build is run on R16B03.
31 changes: 31 additions & 0 deletions RELEASES.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@ Releases

Vsn | Date | Desc
------|------------|----------
[0.2.3](#0.2.3) | In Progress | [Add documentation and diagrams](#0.2.3)
[0.2.2](#0.2.2) | 2015/04/16 | [Add app.config configuration option](#0.2.2)
[0.2.1](#0.2.1) | 2015/01/21 | [Dialyzer fixes; lb and serial queue mirrored](#0.2.1)
[0.2.0](#0.2.0) | 2014/12/31 | [Add node discovery and load balance updates](#0.2.0)
[0.1.6](#0.1.6) | 2014/10/17 | [Cleanup queue ownership and status interface](#0.1.6)
[0.1.5](#0.1.5) | 2014/10/15 | [Serial buffering of pending requests](#0.1.5)
[0.1.4](#0.1.4) | 2014/10/06 | [Allow enabled/disabled elysium_queue on startup](#0.1.4)
Expand All @@ -11,6 +15,33 @@ Releases
[0.1.1](#0.1.1) | 2014/09/25 | [Session decay](#0.1.1)
[0.1.0](#0.1.0) | 2014/09/22 | [Initial release](#0.1.0)

### <a name="0.2.3"></a>0.2.3 Add documentation and diagrams

* Add doc directory with text and diagrams
* Eliminate elysium_session_enqueuer module
* Remove unused elysium_queue:node_change/1
* Add Cassandra node reporting on connection errors
* Move pending requests earlier than connection queue in supervisor

### <a name="0.2.2"></a>0.2.2 Add app.config configuration option

* {config_app_config, atom()} defines the app.config parameter
* Using app.config means the atom() has to be a loaded application
* Add some status reporting for metrics

### <a name="0.2.1"></a>0.2.1 Dialyzer fixs; lb serial queue mirrored

* Fixed dialyzer complaints
* Mirrored serial queue functionality in load balance queue
* Updated copyright notices to 2015
* Improved connection error reporting

### <a name="0.2.0"></a>0.2.0 Add node discovery and load balance updates

* Add elysium_peer_handler to query Cassandra for cluster nodes
* Update load balance ring buffer with newly discovered nodes
* Add some status reporting for metrics

### <a name="0.1.6"></a>0.1.6 Cleanup queue ownership and status interface

* Queued sessions are now {{Ip, Port}, Session_Pid} for debugging
Expand Down
96 changes: 96 additions & 0 deletions doc/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
elysium
=======

### Elysium provides dynamic access to a Cassandra cluster

- Uses round-robin connection semantics
- Manages live connections via checkin/checkout
- Addresses overload using a pending request queue
- Periodically checks for new Cassandra nodes

-------
### Resilience is provided via rest_for_one supervision

- ETS data management: elysium_buffer_sup
- Application status/control: elysium_queue
- Load balancing: elysium_lb_queue
- Cassandra node discovery: elysium_peer_handler
- Connection queues: serial queue of connections and requests
- Connection management: elysium_connection_sup

-------
### Supervision tree

![Elysium Top-Level Supervision Tree](supervision_tree.png)

-------
### Queues are more persistent than connections

- Loss of all connections does not affect queues
- unless elysium_sup itself is taken out
- ets tables, status/control are last to die
- ensures monitoring works when things are down
- activate/deactivate adds/destroys connections
- currently too abrupt
- in future will nicely drain in flight queries
- load balancing survives when requests/connection queues down
- cluster node discovery runs even if connections gone
- activate creates new connections using round-robin

-------
### Basic operation is a checkin/checkout

- Load balancing via a ring buffer
- checkin/checkout Cassandra {Host, Port}
- connections are round-robin
- slow to respond nodes are skipped
- ring buffer updated by elysium_peer_handler
- Connections are in a checkin/checkout queue
- implemented as a behaviour
- currently relies on elysium_serial_queue
- elysium_parallel_queue trips on concurrency issues
- Requests checkout a connection and then check it back in
- if none available, request is put in pending queue
- connection is long-lived seestar Cassandra socket gen_server
- Also supports a one-shot request
- spawns a new seestar connection
- discards the connection after the request completes

-------
### Stochastic connection migration

- Connections decay periodically
- randomly in N chances per 1 Billion requests
- decayed connections are immediately replaced
- the new connection is placed at the end of the queue
- unless it is used immediately on a pending request
- Reconnect avoids congestion
- round-robin skips slow to respond nodes
- redistributes all connections over time
- restarts seestar sessions to avoid memory/staleness issues

-------
### Data flow logic

![Elysium Data Flow](data_flow.png)

-------
### Three methods of configuration

- {config_mod, Module}
- defines a functional interface for all config values
- allows adaptive functionally changing parameters
- {vbisect, Binary}
- uses a binary dictionary for all config values
- shared, lock-free, read-only configuration
- any changes have to be distributed to running processes
- {config_app_config, elysium}
- refers to a top-level config block in the app.config file
- has to be the name of a loaded application
- simplest is to use elysium as an included_application
- Performance is not equal
- config_mod is 50% slower
- vbisect and app.config seem to have similar performance
- not measured with 16-core or more servers


51 changes: 51 additions & 0 deletions doc/data_flow.dot
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
digraph G {
rankdir = LR;
label = "Elysium Data Flows";
fontcolor = red;

create [ shape=Mrecord label = " <cc> Create Connections\n(Round Robin DB Cluster) | <lb> Load Balance Queue | lb3 | lb2 | lb1" ];
Query [ shape=box label = "DB Query" ];
pq [ shape=Mrecord label = " Pending\nRequest\nQueue | <r5> req5 | <r4> req4 | <r3> req3 | <r2> req2 | <r1> req1" ];

subgraph pair2 {
rank = same;

checkin [ shape=box color=blue ];
decay [ shape=Mdiamond label = "Decay Connection?"];
pending [ shape=Mdiamond label = "Pending Requests?"];
cq [ shape=Mrecord label = "Idle\nConnection\nQueue | <c5> cxn5 | <c4> cxn4 | <c3> cxn3 | <c2> cxn2 | <c1> cxn1" ];
socket [ shape=Mdiamond label = "Connections Available?" ];
checkout [ shape=box color=blue ];
}

activate -> create:cc [ label="init config:max_sessions" fontcolor=blue ];
create:cc -> pending:w;

cq -> checkout;
with_cxn:n -> checkout:w;

pending:e -> pq [ label="Yes" fontcolor=blue ];
pq -> Query;

checkin:s -> pending;
pending:s -> decay:n;
decay:w -> create:lb [ label="Yes\n(not on first create)" fontcolor=blue ];
decay -> cq [ label="No" fontcolor=blue ];

subgraph {
rank=min;

activate [ shape=box fontcolor=red label = "elysium_queue:activate" ];
one_shot [ shape=box color=blue fontcolor=red label="elysium_connection:one_shot_query" ];
destroy [ shape=box color=blue label="Close Connection" ];
one_shot_query [ shape=box label="DB Query" ];
with_cxn [ shape=box color=blue fontcolor=red label="elysium_connection:with_connection" ];
}


checkout:e -> socket:w;
socket:s -> Query:s [ fontcolor=blue label="Yes" ];
socket:e -> pq:s [ fontcolor=blue label="No" ];
Query:n -> checkin:e;
one_shot -> create:lb -> one_shot_query -> destroy;
}
Binary file added doc/data_flow.pdf
Binary file not shown.
Binary file added doc/data_flow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
41 changes: 41 additions & 0 deletions doc/queue_state.dot
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
digraph g {

graph [fontsize=30 labelloc="t" label="Elysium Queue FSM States" splines=true overlap=false rankdir = "TB"];
ratio = auto;

wait [ style = "filled, bold" penwidth = 5 fillcolor = "white" fontname = "Courier New" shape = "Mrecord"
label =<<table border="0" cellborder="0" cellpadding="3" bgcolor="white">
<tr><td bgcolor="black" align="center" colspan="2"><font color="white">'WAIT_REGISTER'</font></td></tr>
<tr><td align="left">&#40;0&#41; config_type </td></tr>
<tr><td align="left">&#40;1&#41; load balance queue name </td></tr>
<tr><td align="left">&#40;2&#41; sessions queue name</td></tr>
<tr><td align="left">&#40;3&#41; requests queue name </td></tr>
<tr><td align="left">&#40;4&#41; connection_sup undefined </td></tr></table>> ];

disabled [ style = "filled, bold" penwidth = 5 fillcolor = "white" fontname = "Courier New" shape = "Mrecord"
label =<<table border="0" cellborder="0" cellpadding="3" bgcolor="white">
<tr><td bgcolor="black" align="center" colspan="2"><font color="white">'DISABLED'</font></td></tr>
<tr><td align="left">&#40;4&#41; connection_sup Pid </td></tr></table>> ];

subgraph running_states {
rank = same

inactive [ style = "filled, bold" penwidth = 5 fillcolor = "white" fontname = "Courier New" shape = "Mrecord"
label =<<table border="0" cellborder="0" cellpadding="3" bgcolor="white">
<tr><td bgcolor="black" align="center" colspan="2"><font color="white">'INACTIVE'</font></td></tr>
<tr><td align="left" port="r1">&#40;4&#41; connection_sup Pid </td></tr></table>> ];

active [ style = "filled, bold" penwidth = 5 fillcolor = "white" fontname = "Courier New" shape = "Mrecord"
label =<<table border="0" cellborder="0" cellpadding="3" bgcolor="white">
<tr><td bgcolor="black" align="center" colspan="2"><font color="white">'ACTIVE'</font></td></tr>
<tr><td align="left" port="r2">&#40;4&#41; connection_sup Pid </td></tr></table>> ];
}

wait -> disabled [ penwidth = 5 fontsize = 28 fontcolor = "blue" label = "register connection"];
wait -> inactive [ penwidth = 5 fontsize = 28 fontcolor = "blue" label = "deactivate" ];
disabled -> inactive [ penwidth = 5 fontsize = 28 fontcolor = "blue" label = "enable" ];

inactive -> active [ penwidth = 5 fontsize = 28 fontcolor = "blue" label = "activate" ];
active -> inactive [ penwidth = 5 fontsize = 28 fontcolor = "blue" label = "deactivate" ];

}
Loading