Speak ALL the languages! RethinkDB and HBase Top

Transcription

Speak ALL the languages! RethinkDB and HBase Top
www.jaxenter.com
Issue April 2015 | presented by
#44
The digital magazine for enterprise developers
Polyglots
do it
better
Speak ALL the languages!
Shutterstock’s polyglot enterprise
and Komodo’s polyglot IDE
RethinkDB and HBase
Top performance fails
Five challenges in performance management
©iStockphoto.com/epic11
A closer look at two database alternatives
Editorial
E pluribus unum
Index
It’s an old and rather out-of-fashion motto of the United
States. E pluribus unum, one out of many. Early twentiethcentury ideals of the cultural melting pot may have failed in
western society. But they may still work in IT. Developers will
forever dream of one Holy Grail language that will rule them
all. As nice as it sounds to be a full-stack developer, knowing your entire enterprise’s technology inside out, from the
ListView failure modes to the various components’ linguistic
syntaxes – that’s a near superhuman talent. As much as all developers would love to be fluently multilingual, in practice it’s
difficult to keep up. But instead it’s the unity of the enterprise
itself that can create one whole out of many languages, not
the individual developer.
In this issue, Chris Becker explains how Shutterstock’s
gradual evolution from one to many languages was central
to the success of the company’s technology, and how out of
one language became many specialist development areas, in
turn all unified by the enterprise. Meanwhile for web developers looking for one tool to rule all their front-end languages,
we’ve got a helpful guide to the polyglot IDE Komodo, which
just unveiled its ninth release.
We’ve also got some useful introductions to HBase and
RethinkDB (both big salary-earners according the recent
Dice. com survey), as well as Vaadin web applications. And
finally for anyone concerned with the speed of their website
during high traffic, we have a couple of valuable lessons about
how to avoid performance failures.
Coman Hamilton,
Editor
Polyglot enterprises do it better
5
Shutterstock’s multilingual stack
Chris Becker
An introduction to polyglot IDE Komodo
7
One IDE to rule all languages
Nathan Rijksen
HBase, the “Hadoop database”
12
A look under the hood
Ghislain Mazars
An introduction to building realtime apps with RethinkDB
16
First steps
Ryan Paul
Performance fails and how to avoid them
20
The five biggest challenges in application performance management
Klaus Enzenhofer
Separating UI structure and logic
22
Architectural differences in Vaadin web applications
Matti Tahvonen
Model View ViewModel with JavaFX
24
A look at mvvmFX
Alexander Casall
www.JAXenter.com | April 2015
2
featuring
October 12 – 14th, 2015
Business Design Centre, London
MARK YOUR
CALENDAR!
www.jaxlondon.com
Presented by
Organized by
Hot or Not
HTML6
If you’re a web developer that doesn’t check the news that often, make sure you’re
sitting before reading this. The web community is furiously debating a radical new
proposal for HTML6. The general idea is that the next version of HTML will be
developed in a way that allows it to dynamically run single-page apps without Java­
Script. Yes, a HTML that wants to compete with JavaScript. The community is torn,
but proposal author Bobby Mozumder makes an interesting case, claiming that
HTML needs to follow the “standard design pattern emerging via all the front-end
JavaScript frameworks where content is loaded dynamically via JSON APIs.” He’s
won over quite a few members of the community, but there’s no telling if the W3C
will ever lower its eyebrows to this request.
Cassandra salaries
It’s often considered impolite to ask another programmer how much they earn. But
that doesn’t mean colleagues, recruiters and employers aren’t picturing an estimated
annual salary hovering over your head while they pretend to listen to you. It turns out
that right now, Cassandra, PaaS and MapReduce pros are the ones with the biggest
dollar signs above their heads (according to the latest Dice.com survey). Anyone lucky
enough to be an expert in this area (and living in the US) will be making an average
of at least $127k per year. America’s Java developers will be lucky to scrape anything
above $100k, while poor JavaScript experts earn as little as $94k a year. We’d like to
invite you to join us in performing the world’s smallest violin concerto for the JS community – because seriously that’s still a mighty shedload of cash to the kinds of people
that write the texts you’re reading [long sigh].
How banks treat
technology
Publishing a book
in IT
In a recent series of interviews on JAXenter, developers in finance have told
us about the pros and cons of developing for banks. And while the salary
is very definitely a major pro (banks
pay 33 percent to 50 percent more,
says HFT developer Peter Lawrey), the
management-level attitude to technology is less than favourable. “Many
financial companies see their development units as a necessary evil,” says
former NYSE programmer Dr. Jamie
Allsop. Not only does technology come
across as being second-class, but it’s
often impossible to drive innovation
when working on oldschool banking
systems. “If it’s new then it must be
risky,” dictates the typical financial ops
attitude according to finance techno­
logy consultant Mashooq Badar.
At some stage in their career, most
developers will ponder the idea of
publishing a book. And the appeal is
understandable. It’s a great milestone
on your CV, developers in the
community may look up to you and
you’ll be sure to make your parents
proud (even if they don’t have a clue
what you’re writing about). But
don’t expect to make big money (like
those Cassandra developers, above)
when you self-publish on Leanpub.
Meanwhile you’ll need to be dedicating
much of your spare time to promoting
the book. Before your eyes go lighting
up with dollar signs at the thought
of becoming a wealthy international
superstar IT author, give yourself a
quick reality check with a couple of
Google searches on IT publishing.
www.JAXenter.com | April 2015
4
© iStockphoto.com/jamtoons
Languages
Shutterstock’s multilingual stack
Polyglot enterprises
do it better
Being a multilingual, multicultural company doesn’t just bring benefits on a level
of corporate culture. Shutterstock search engineer Chris Becker explains why
enterprises need to stop speaking just one language.
by Chris Becker
A technology company must consider the technology stack
and programming language that it’s built on. Everything else
flows from those decisions. It will define the kinds of engineers who are hired and how they will fare.
During Shutterstock’s early days, a team of Perl developers built the framework for the site. They chose Perl for the
benefits of CPAN and flexibility that language would bring.
However, it opened up a new problem – we could only hire
those familiar with Perl. We hired some excellent and skilled
engineers, but many others were left behind because of their
inexperience with and lack of exposure to Perl. In a way, it
limited our ability to grow as a company.
www.JAXenter.com | April 2015
But in recent years, Shutterstock has grown more “multilingual.” Today, we have services written in Node.js, Ruby, and
Java; data processing tools written in Python; a few of our
sites written in PHP; and apps written in Objective-C.
Developers specialize in each language, but we communicate across different languages on a regular basis to debug,
write new features, or build new apps and services. Getting
from there to here was no easy task. Here are a few strategic decisions and technology choices that have facilitated our
evolution:
Service-oriented Architectures
First, we built out all our core functionality into services.
Each service could be written in any language while provid-
5
Languages
“The biggest obstacle blocking
new developers is often the
technical bureaucracy needed
to manage each runtime.”
ing a language-agnostic interface through REST frameworks.
It has allowed us to write separate pieces of functionality in
the language most suited to it. For example, search makes use
of Lucene and Solr, so Java made sense there. For our translation services, Unicode support is highly important, so Perl
was the strongest option.
Common Frameworks
Between languages there are numerous frameworks and
standards that have been inspired or replicated by one another. When possible, we try to use one of those common technologies in our services. All of our services provide RESTful
interfaces, and internally we use Sinatra-inspired frameworks
for implementing them (Dancer for Perl, Slim for PHP, Express for Node etc.). For templating we use django-inspired
frameworks such as Template::Swig for Perl, Twig for PHP,
and Liquid for Ruby. By using these frameworks we can help
improve the learning curve when a developer jumps between
languages.
Runtime Management
Often, the biggest obstacle blocking new developers is technical bureaucracy needed to manage each runtime – managing library dependencies, environment paths, and all the
command line settings and flags needed to do common tasks.
Shutterstock simplifies all of that with Rockstack. Rockstack
provides a standardized interface for building, running, and
testing code in any of its supported runtimes (currently: Perl,
PHP, Python, Ruby, and Java).
Not only does Rockstack give our developers a standard
interface for building, testing, and running code, but it also
supplies our build and deployment system with one standard set of commands for running those operations as well for
any language. Rockstack is used by our Jenkins cluster for
running builds and tests, and our home-grown deployment
system makes use of it for launching applications in dev, QA,
and production.
Developer Meetups
We also want our developers to learn and evolve their skillsets. We host internal meetups for Shutterstock’s Node developers, PHP Developers, and Ruby developers where they
can get fresh looks and feedback on code in progress. These
meetups are a great way for them to continue professional
development and also to meet others who are tackling similar
projects. There’s no technology to replace face-to-face communication, and what great ideas and methods can come
from it.
Openness
We post all the code for every Shutterstock site and service
on our internal GitHub. Everyone can find what we’ve been
working on. If you have an idea for a feature, you can fork
off a branch and submit a pull request to the shepherd of
that service. This openness goes beyond transparency; it encourages people to try new things and to see what others are
implementing.
Our strategies and tools echo our mission. We want engineers to think with a language-agnostic approach to better
work across multiple languages. It helps us to dream bigger
and not get sidelined by limitations. As the world of programming languages becomes much more fragmented, it’s becoming more important than ever from a business perspective to
develop multilingual-friendly approaches.
We’ve come a long way since our early days, but there’s still
a lot more we can do. We’re always reassessing our process
and culture to make both the code and system more familiar
to those who want to come work with us.
This has been adapted from a post that originally ran on
the Shutterstock Tech blog. You can read the original here:
bits.shutterstock.com/2014/07/21/stop-using-one-language/.
Testing Frameworks
In order to create a standardized method for testing all the
services we have running, we developed (and open sourced!)
NTF (Network Testing Framework). NTF lets us write tests
that hit special resources on our services’ APIs to provide status information that show the service is running in proper
form. NTF supplements our collection of unit and integration
tests by constantly running in production and telling us if any
functionality has been impaired in any of our services.
www.JAXenter.com | April 2015
Chris Becker is the Principal Engineer of Search at Shutterstock where
he’s worked on numerous areas of the search stack including the search
platform, Solr, relevance algorithms, data processing, analytics, internationalization, and customer experience.
6
Languages
One IDE to rule all languages
An introduction to
polyglot IDE Komodo
The developer world is becoming “decidedly more polyglot”, Komodo CTO recently told JAXenter.
To cater to this changing community, the multi-lingual IDE Komodo is steadily increasing its number
of supported languages. Let’s take a closer look at what it does best and how to get started.
by Nathan Rijksen
A few weeks ago, Komodo IDE 9 was released, featuring a
host of improvements and features. Many of you have since
downloaded the 21-day trial and have likely spent some time
learning more about what this nimble beast can do. I thought
now would be a good time to run you through some simple workflows and give you a general idea of how Komodo
works. This is just a short introduction, and much more information can be found on the Komodo website under screencasts, forums and of course the documentation.
Type in your search query, and Commando will show you
results in real-time. Select an entry and press “Enter” to open/
activate it, or press tab/right arrow to “expand” the selection,
allowing you to perform contextual actions on it. For example, you could rename a file right from Commando (Figure 3)
using nothing but your keyboard. Commando doesn’t need
to be accessed from the toolbar – you can use a shortcut to
launch it so you can also be 100 percent keyboard driven
(CTRL + Shift + O on Windows and Linux, or CMD + Shift
+ O in Mac).
User Interface Quick Start
Figure 1 shows what you’ll see when you first launch Komodo.
The first few icons on the toolbar are pretty self e­ xplanatory,
and the ones that aren’t immediately self-explanatory are
­easily discovered by hovering your mouse over them. You’ll
find quick access to a lot of great Komodo features here: debugging, regex testing, source code control, macro recording/
playback, etc. Of course these aren’t all displayed by default;
you would need a big screen to show all that. You can easily
toggle individual buttons or button groups by right clicking
the toolbar and selecting “Customize”. Three buttons you’ll
likely be using a lot are the pane toggles (Figure 2).
Clicking these allows you to toggle a bottom, left and right
pane, each of which holds a variety of “widgets” that allow
you to do anything from managing your project files to unit
testing your project. You can customize the layout of these
widgets by right-clicking their icon in the panel tab bar.
At the far right of the toolbar you’ll find a search field,
dubbed Commando, which lets you easily search through
your project files, tools, bookmarks, etc.
www.JAXenter.com | April 2015
Figure 1: Komodo start screen
Figure 2: Pane buttons
7
Languages
Figure 4: The “burger menu”
Figure 3: Commando
Starting a Project
The first thing you’ll want to do is start a new project. You
don’t need to use projects, but it’s highly encouraged as it
gives Komodo some context: “This is what I’m working with,
and this is how I like working with this particular project.”
To start a new project, simply click the “Project” menu
and then select “New Project”. On Windows and Linux the
menus are at the far right of the toolbar under the “burger
menu” (Figure 4), as people tend to lovingly call it.
You can also start the project from the “Quick Launch”
page which is visible if no other files are opened, or you can
use Commando by searching for “New Project” and selecting
the relevant result (it should show up first).
When creating a project, Komodo will ask you where to
save your project; this will be the project source, and Komodo will create a single project file in this folder and one
project folder that can hold your project specific tools. You
can separate your project from your source code by creating
your project in the preferred location and afterwards modifying the project source in the “Project Preferences”.
Once your project is created you might want to do just that:
adjust some of the project specific preferences. To do this,
open your “Project” menu again and select “Project Preferences” (Figure 5), or use Commando to search for “Project
Properties”. There you can customize your project to your
liking, changing it’s source, exclude files, using custom indentation, and more. It’s all there.
If you’re working with a lot of different projects, you’ll
definitely want to check out the Projects widget at the bottom
of the Places widget. Open the left side pane and select the
“Places” widget, then look at the bottom of the opened pane
for “Projects”.
Figure 5: Project properties
Figure 6: Scope
www.JAXenter.com | April 2015
8
Languages
Figure 7: Open files
Figure 8: Snippet
Opening (and Managing) Files
You now have your project and are ready to start opening
and editing some files. The most basic way of doing that is
by using the “Places” widget in the left pane, but where’s the
fun in that? Again, you could use Commando to open your
files… Simply launch Commando from your toolbar or via
the shortcut and type in your filename. Commando by default searches across several “search scopes”, so you may get
results for tools, macros, etc. If you want to focus down on
just your files you can hit the icon to the left of the search field
to select a specific “search scope”. In this case you’ll want the
Files scope (Figure 6).
You can define custom shortcuts to instantly launch a specific Commando scope, so you don't need to be using this
scope menu with your mouse each time if that’s not your preferred method.
Now that you’ve opened a couple of files, you may be starting to notice that your tab bar is getting a bit unwieldy. This
is nothing new to editors, and a lot of programmers either
deal with it or constantly close files that aren’t immediately
relevant anymore. Some editors get around this by giving you
another way of managing opened files; luckily Komodo is one
of these editors, and even goes a step further. We’ve already
spoken about Commando a lot so I’ll skip past the details and
just say “There’s an Opened Files search scope.” A more UIdriven method is available through the “Open Files” widget
(Figure 7), accessible under the left pane right next to the
“Places” widget (you’ll have to click the relevant tab icon at
the top of the pane).
The Open Files widget allows you to, well, manage your
open files. But more than that it allows you to manage how
you manage open files – talk about meta! When you press the
“Cog” icon you will be presented with a variety of grouping
and sorting options, allowing you to group and sort your files
the way you want. If you’re comfortable getting a bit dirty
www.JAXenter.com | April 2015
with a custom JavaScript macro you can even create your
own groups and sorting options.
Using Snippets
Many programmers use snippets (also referred to as Abbreviations in Komodo), and many editors facilitate this. Komodo
again goes a step further by allowing very fine-tuned control
of snippets. First let’s just use a simple snippet though.
Open a new file for your preferred language. We’ll use PHP
in our example (because so many people do, I’ll refrain from
naming my personal favourite). Komodo comes with some
pre-defined snippets you can use, but let’s define our own.
Open your right pane and look for the “Samples” folder;
these are the samples Komodo provides for you to get started
with. I would suggest you cut and paste the “Abbreviations”
folder residing in this folder to be at the root of your toolbox,
as it provides a good starting structure. Once that’s done,
head into Abbreviations | PHP; these are abbreviations that
only get triggered when you are using a PHP file. Right-click
the PHP folder and select Add | New Snippet.
Here you can write your snippet code. Let’s write a “private function” snippet. Choose a name for your snippet: this
will be the abbreviation that will trigger the snippet, and I’ll
use “prfunc” because I like to keep it short and simple. Then
write in your code. You can use the little arrow menu to the
right of the editor to inject certain dynamic values. Most relevant right now is the “tabstop” value, which tells Komodo
you want your cursor to stop and enter something there. Figure 8 shows what my final snippet looks like.
You’ll note I checked “Auto-Abbreviation”. This will allow
me to trigger the snippet simply by writing its abbreviation.
The actual trigger for this is configurable, so you can instead
have it trigger when you press Tab. Now we have our snippet
and are ready to start using it. Simply write “prfunc”. Next,
let’s talk about version control, wait ... What? You were
9
Languages
create your own macros, which – aside from extending your
editor – can be used to do anything from changing UI font size
to writing your own syntax checker, or overhauling the entire
UI. You have direct access to the full Komodo API that is used
to develop Komodo. That’s a whole different topic though …
One far too big to get into now, but be sure to have a look at
all that the “Toolbox” offers you, because snippets are just
the tip of the iceberg.
Previewing Changes
Figure 9: Debug
Figure 10: Preview markdown
So, you’ve created some files, edited them (using snippets I
hope – that took a while to write down, you know!) and now
want to see the results. Rather than leaving Komodo for your
terminal or your browser, why not just do it from inside Komodo?
Since it was a PHP file we were working on, you can simply
launch your code by pressing the “Start or continue debugging” button in your toolbar. This will of course run your code
through a debugger but it’s also useful just to run your code
and see the output. You could skip the debugger altogether
by using the Debug | Run Without Debugging menu. Assuming your PHP file actually outputs anything, Komodo will
open the bottom pane and show you your code output. Output not what you expected it to be? Set a few breakpoints and
jump right into the middle of your code (Figure 9).
You could take it even further and start editing variables
while debugging, starting a REPL from the current breakpoint, etc, provided the language you are using supports this
functionality.
I did say browser back there though, so what about previewing HTML files? Simply open your HTML file and hit
the “Browser Preview” button in your toolbar. Select “In a
Komodo Tab” for the “Preview Using” setting and customize
the rest to your liking, then hit Preview. The preview will be
rendered using Gecko. Komodo is built on Mozilla so basically it’s rendering using Firefox, right from Komodo.
With Komodo 9, you can now even preview Markdown
files in real-time. Just open a Markdown file and hit the “Preview Markdown” button in your toolbar to start previewing
(Figure 10).
Using Version Control
Figure 11: VCS
expecting more? No that’s it, you type “prfunc” and it will
auto-trigger your snippet. It’s that easy.
There’s many other ways of triggering and customizing
snippets (ahem, Commando again); you can even create snippets that use EJS so you can add your own black magic to the
mix. And for use cases where even that isn’t enough, you can
www.JAXenter.com | April 2015
If you’re a programmer, you probably use version control of
some kind (and if not – you really should!) whether it be Git,
Mercurial, SVN, Perforce or whatever else strikes your fancy.
Komodo doesn’t leave you hanging there; you can easily access
a variety of VCS tasks right from inside Komodo (Figure 11).
Let’s assume you already have a repository checked out/
cloned. We’ll go straight to the most basic and most frequently used VCS task – committing. Make your file edits, and simply hit the VCS toolbar button (you may need to enable this
first by right clicking the toolbar and hitting “Customize”).
Then select “Commit”, enter your commit message and fire
it off. Komodo will show you the results (unless you disabled
this) in the bottom pane.
Komodo IDE 9 even shows you the changes to files as you
are editing them, meaning it will show what was added, edited and deleted since your last commit of that file (Figure 12).
10
Languages
“Appearance” and “Color Scheme” sections are probably of
particular interest to you. Note that by default Komodo hides
all the advanced preferences that you likely wouldn’t want
to see unless you are a bit more familiar with Komodo. You
can toggle these advanced preferences by checking the “Show
Advanced” checkbox at the bottom left of your Preferences
dialog (Figure 13).
Inevitably some of you will find something that simply isn’t
in Komodo though, because there isn’t a single IDE out there
that has ALL the things for EVERYONE. When this happens,
the community and customizability of Komodo is there for
you, so be sure to check out the variety of addons, macros,
color schemes etc. at the Komodo Resources website, and if
you want to get creative, share ideas or request features from
the community, then head on over to the Komodo forums.
Just a Glimpse
Figure 12: Track changes
Hopefully I’ve given you a glimpse of what Komodo has to
offer, or at least given you an idea of how to get started with
it. Komodo is a great IDE, even if you aren’t in the market
for a huge IDE, and the best thing about it is you don’t need
to buy/launch another IDE each time you want to work on
another language. Komodo has you covered for a variety of
languages and frameworks, including (but not limited to!)
Python, PHP, Go, Ruby, Perl, Tcl, Node.js, Django, HTML,
CSS and JavaScript. Enjoy!
Figure 13: Preferences
The left margin (with your line numbers) will become
colored when edits are made. Clicking on one of these colored
blocks will open a small pop-in showing what was changed
and allowing you to revert the change or share the changeset
via kopy.io. Kopy.io is a new tool created by the Komodo
team, which serves as a modernized pastebin, implemeting
nifty new features you probably havent seen elsewhere such
as client-side encrypted “kopies” and auto-sizing text to fit to
your browser window.
Customizing
You’ve now gone through a basic workflow and are starting to really warm up to Komodo (hopefully), but you wish
your color scheme was light instead of dark, or the icon set
is too monotone and grey for your taste, or ... or ... Again
Komodo’s got you covered. Head into the Preferences dialog
via Edit | Preferences or Komodo | Preferences on OSX.
Here you can customize Komodo to your heart’s content: the
www.JAXenter.com | April 2015
Nathan Rijksen, Komodo developer, has web dev expertise and experience as a backend architect, application developer and database engineer, and has also worked with third-party authentication and payment
modules. Nathan is a long time Komodo user and wrote multiple macros
and extensions before joining the Komodo team.
11
Databases
A look under the hood
HBase, the “Hadoop
database”
Even MongoDB has its limits. And where the favourite of the database world reaches its limits in
­scalability, that’s just where HBase enters. Tech entrepreneur Ghislain Mazars shows us the strong
points of this community-driven open-source database.
by Ghislain Mazars
HBase and the NoSQL Market: In the myriad of NoSQL
databases today available on the market, HBase is far from
having a comparable share to market leader MongoDB. Easy
to learn, MongoDB is the NoSQL darling of most application developers. The document-oriented database interfaces
well with lightweight data exchanges format, typically JSON,
and has become the natural NoSQL database choice for many
web and mobile apps (Figure 1).
Where MongoDB (and more generally JSON databases)
reaches its limits is for highly scalable applications requiring
complex data analysis (the oft denominated “data-intensive”
applications). That segment is the sweet spot of column-oriented databases such as HBase. But even in that particular
category, HBase has lately oft been overlooked in favour of
Cassandra. Quite a surprising turn of events actually as Facebook, the “creator” of Cassandra, ditched its own creation in
2011 and selected HBase as the database for its Messages application. We will come back to the technical differences between the two databases, but the main reason for ­Cassandra’s
remarkable comeback is to be found elsewhere.
It is worth noting however that in the process, Cassandra
has lost a lot of its open-source nature. 80 percent of the
committers on the Apache project are from Datastax and the
management features beloved by enterprise customers are
proprietary and part of DSE (“DataStax Enterprise”). Going
one step further, the integration with Apache Spark, the new
whizz-kid of Big Data, is currently only available as part of
DSE …
HBase, a community-driven open-source project
Unlike Cassandra, HBase very much remains a communitydriven open-source project. No less than 12 companies are
represented in the Apache project committee and the three
Hadoop distributors, Cloudera, Hortonworks and MapR,
Cassandra, the comeback kid of NoSQL databases
With Cassandra, we find a pattern common to most major
NoSQL databases, i.e. the presence of a dedicated corporate
sponsor. Just as MongoDB (with MongoDB Inc, formerly
called 10gen) and Couchbase (with Couchbase Inc.), the technical and market development of Cassandra is spearheaded
by Datastax Inc. From a continued effort on documentation
(Planet Cassandra) to the stewardship of the user community
with meetups and summits, Datastax has been doing a remarkable job in waving high the Cassandra flag. These efforts
have paid off, and Cassandra now holds the pole position
among wide-column databases.
www.JAXenter.com | April 2015
Figure 1: Relative adoption of NoSQL skills
12
Databases
Figure 2: HBase schema design; source: Introduction to HBase Schema Design (Amandeep Khurana)
share the responsibility of marketing the database and supporting its corporate users.
As a result, HBase sometimes lacks the marketing firepower of one company betting its life on the product. If it had
been the case, no doubt that HBase would be in a 1.x release
by now: while Hadoop made a big jump from 0.2x to 1.0 in
2011, HBase continued to move steadily in the 0.9x range!
And the three companies including the database in their portfolio show a tendency to privilege other (more proprietary) offerings of theirs and thus provide a restrictive image of HBase.
In this context, it is quite an achievement that HBase occupies such an enviable place among NoSQL databases. It
owes this position to its open-source community, strong installed base within web properties (Facebook, Yahoo, Groupon, eBay, Pinterest) and distinctive Hadoop connection. So
in spite or maybe thanks to its unusual model, HBase could
still very much win… As Cassandra has shown in the last 2/3
years, things can move fast in this market. But we will come
back to that later on, for now, let us take a more technical
look at HBase.
Under the Hood
Hadoop implementation of Google’s BigTable: HBase is
an open-source implementation of BigTable as described
in the 2005 paper from Google (http://research.google.
com/archive/bigtable.html). Initially developed to store
crawling data, BigTable remains the distributed database
technology underpinning some of Google’s most famous
services, Google Docs & Gmail. Of course, as should be
expected from a creation of Google, the wide-column database is super scalable and works on commodity servers.
It also features extremely high read performance, ensuring
www.JAXenter.com | April 2015
for example that a Gmail user instantaneously retrieves all
its latest emails.
Just like BigTable, HBase is designed to handle massive
amounts of data and is optimized for read-intensive applications. The database is implemented on top of the Hadoop
Distributed File System (HDFS) and takes advantage of its
linear scalability and fault tolerance. But the integration with
Hadoop does not stop at using HDFS as the storage layer:
HBase shares the same developer community as Hadoop and
offers native integration with Hadoop MapReduce. HBase
can serve as both the source or the destination of MapReduce
jobs. The benefit here is clear: there is no need for any data
movement between batch MapReduce ETL jobs and the host
operational and analytics database.
HBase schema design
HBase offers advanced features to map business problems to
the data model, which makes it way more sophisticated than
a plain key-value store such as Redis. Data in HBase is placed
in tables, and the tables themselves are composed of rows,
each of which has a rowkey.
The rowkey is the main entry point to the data: it can be
seen as the equivalent of the primary key for a traditional
RDBMS database. An interesting capability of HBase is that
its rowkeys are byte arrays, so pretty much anything can serve
as the rowkey. As an example, compound rowkeys can be created to mix different criteria into one single key, and optimize
data access speed.
In pure key-value mode, a query on the rowkey will give
back all the content of the row (or to take a columnar view,
all of its columns). But the query can also be much more precise, and specifically address (Figure 2):
13
Databases
•A family of columns
•A specific column, and as a result a cell which is the intersection of a row and a column
•Or even a specific version of a cell, based on a timestamp
Combined, these different features greatly improve the base
key-value model. With one constraint, the rowkey cannot be
changed, and should thus be carefully selected at design stage
to optimize row-key access or scan on a range of rowkeys. But
beyond that, HBase offers a lot of flexibility: new columns
can be added on the fly, all the rows do not need to contain
the same columns (which makes it easy to add new attributes
to an existing entity) and nested entities provide a way to
define relationships within what otherwise remains a very flat
model.
Cool Features of HBase
Sorted rowkeys: Manipulation of HBase data is based on
three primary methods: Get, Put, and Scan. For all of them,
access to data is done by row and more specifically according to the rowkey. Hence the importance of selecting an appropriate rowkey to ensure efficient access to data. Usually,
the focus will be on ensuring smooth retrieval of data: HBase
is designed for applications requiring fast read performance,
and the rowkey typically closely aligns with the application’s
access patterns.
As scans are done over a range of rows, HBase lexicographically orders rows according to their rowkeys. Using these
“sorted rowkeys”, a scan can be defined simply from its start
and stop rowkeys. This is extremely powerful to get all relevant
data in one single database call: if we are only interested in the
most recent entries for an application, we can concatenate a
timestamp with the main entity id to easily build an optimized
request. Another classical example relates to the use of geohashed compound rowkeys to immediately get a list of all the
nearby places for a request on a geographic point of interest.
Control on data sharding
In selecting the rowkey, it is important to keep in mind that
the rowkey strongly influences the data sharding. Unlike traditional RDBMS databases, HBase provides the application
developer with control on the physical distribution of data
across the cluster. Column families also have an influence (all
column members for a family share the same prefix), but the
primary criteria is the rowkey to ensure data is evenly distributed across the Hadoop cluster (data is sorted in ascending
order by rowkey, column families and finally column key). As
rowkeys determine the sort order of a table’s row, each region
in the table ends up being responsible for the physical storage
of a part of the row key space.
Such an ability to perform physical-level tuning is a bit unusual in the database world nowadays, but immensely powerful if the application has a well-defined access pattern. In such
cases, the application developer will be able to guide how the
data is spread across the cluster and avoid any hotspotting by
skillfully selecting the rowkey. And, at the end of the day, disk
access speed matters from an application usability perspective, so it is really good to have some control on it!
www.JAXenter.com | April 2015
“HBase tends to favour
consistency over
availability.”
Strong consistency
In its overall design, HBase tends to favour consistency over
availability. It even supports ACID-level semantics on a perrow basis. This of course has an impact on write performance,
which will tend to be slower than comparable consistent databases. But again, typical use cases for HBase are focused on a
high read performance.
Overall, the trade-off plays in favour of the application
developer, who will have the guarantee that the datastore
always (vs eventually...) delivers the right value of the data.
In effect, the choice of delivering strong consistency frees
the application developer from having to implement cumbersome mechanics at the application level to mimic such a
guarantee. And it is always best when the application developer can focus on the business logic and user experience vs
the plumbing ...
What’s next for HBase?
In the first section, we had a look at HBase's position in the
wider NoSQL ecosystem, and vis-à-vis its most direct competitor, Cassandra. In our second and third sections, we
reviewed the key technical characteristics of HBase, and highlighted some key features of HBase that make it stand out
from other NoSQL databases. In this final section, we will
discuss recent initiatives building out on these capabilities and
the chances of HBase becoming a mainstream operational database in a Hadoop-dominated environment.
Support for SQL with Apache Phoenix
Until recently, HBase did not offer any kind of SQL-like interaction language. That limitation is now over with Apache
Phoenix, an open-source initiative for ad hoc querying of
HBase.
Phoenix is an SQL skin for HBase, and provides a bridge
between HBase and a relational model and approach to manipulate data. In practice, Phoenix compiles SQL queries to
native HBase calls using another recent novelty of HBase, coprocessors. Unlike standard Hadoop SQL tools such as Hive,
Phoenix can both read and write data, making it a more generic and complete HBase access tool.
Further integration with Hadoop and Spark
Over time, Hadoop has evolved from being mainly a HDFS
+ MapReduce batch environment to a complete data platform. An essential part of that transformation has been the
advent of YARN, which provides a shared orchestration
and resource management service for the different Hadoop
components. With the delivery of project Slider end of 2014,
14
Databases
Figure 3: Spark Hadoop integration
HBase cluster resource utilisation can now be “controlled”
from YARN, making it easier to run data processing jobs and
HBase on the same Hadoop cluster.
With a different spin, the ongoing integration work behind
HBase and Spark also contributes to the unification of database operations and analytic jobs on Hadoop. Just as for
MapReduce, Spark can now utilize HBase as both a data
source and a target. With nearly 2/3 of users loading data
into Spark via HDFS, HBase is the natural database to host
low-latency, interactive applications from within a Hadoop
cluster. Advanced analytics provided by Spark can be fed
back directly into HBase, delivering a closed-loop system,
fully integrated with the Hadoop platform (Figure 3).
Final thoughts
With Hadoop moving from exploratory analytics to operational intelligence, HBase is set to further benefit from its
position as the “Hadoop database”.
The imperative of limiting data movements will play strongly in its favour as
enterprises start building complete data
pipelines on top of their Hadoop “data
lake”.
In parallel, HBase is a strong contender for emerging use cases such as the
management of IoT-related time series
data. Incidentally, the recent launch by
Microsoft of a HBase as a Service offering on Azure should be read in that
context.
For these reasons, there is no doubt
that HBase will continue to grow steadily over the next few years. Still the
opportunity is here for more, and for
HBase to have a much bigger impact on
the enterprise market. MapR has in this
perspective recently made a promising move by incorporating
its HBase-derived MapR-DB in its free community edition.
For their part, Hortonworks and Cloudera have been active
on the essential integrations with Slider and Spark. Now is
the time for the HBase community and vendors to move to
the next stage, and drive a rich enterprise roadmap for the
“Hadoop database”, to make HBase sexy and attractive for
mainstream enterprise customers!
Ghislain Mazars is a tech entrepreneur and founder of Ubeeko, the
company behind HFactory, delivering the application stack for Hadoop
and HBase. He is fascinated by the wave of disruption brought by datadriven businesses, and the underlying big data technologies underpinning this shift.
Advert
www.JAXenter.com | April 2015
15
Databases
First steps
An introduction to building realtime apps with
RethinkDB
Built for scalability across multiple machines, the JSON document store RethinkDB is a distributed database that uses an easy query language. Here’s how to get started.
by Ryan Paul
RethinkDB is an open source database for building realtime
web applications. Instead of polling for changes, the developer can turn a query into a live feed that continuously pushes
updates to the application in realtime. RethinkDB’s streaming
updates simplify realtime backend architecture, eliminating
superfluous plumbing by making change propagation a native
part of your application’s persistence layer.
In addition to offering unique features for realtime application development, RethinkDB also benefits from some useful
characteristics that contribute to a pleasant developer experience. RethinkDB is a schemaless JSON document store that
is designed for scalability and ease of use, with easy sharding, support for distributed joins, and an expressive query
language.
This tutorial will demonstrate how to build a realtime web
application with RethinkDB and Node.js. It will use Socket. io
to convey live updates to the frontend. If you would like to
follow along, you can install RethinkDB or run it in the cloud.
First steps with ReQL
The RethinkDB Query Language (ReQL) embeds itself in the
programming language that you use to build your application. ReQL is designed as a fluent API, a set of functions that
you can chain together to compose queries.
Before we start building an application, let’s take a few
minutes to explore the query language. The easiest way to
experiment with queries is to use RethinkDB’s administrative
console, which typically runs on port 8080. You can type RethinkDB queries into the text field on the Data Explorer tab
and run them to see the output. The Data Explorer provides
auto-completion and syntax highlighting, which can be helpful while learning ReQL.
www.JAXenter.com | April 2015
By default, RethinkDB creates a database named test. Let’s
start by adding a table to the testdatabase:
r.db("test").tableCreate("fellowship")
Now, let’s add a set of nine JSON documents to the table
(Listing 1).
When you run the command above, the database will output an array with the primary keys that it generated for all of
the new documents. It will also tell you how many new records it successfully inserted. Now that we have some records
in the database, let’s try using ReQL’s filter command to fetch
the fellowship’s hobbits:
r.table("fellowship").filter({species:"hobbit"})
The filter command retrieves the documents that match the
provided boolean expression. In this case, we specifically
want documents in which the species property is equal to
Listing 1
r.table("fellowship").insert([
{ name: "Frodo", species: "hobbit" },
{ name: "Sam", species: "hobbit" },
{ name: "Merry", species: "hobbit" },
{ name: "Pippin", species: "hobbit" },
{ name: "Gandalf", species: "istar" },
{ name: "Legolas", species: "elf" },
{ name: "Gimili", species: "dwarf" },
{ name: "Aragorn", species: "human" },
{ name: "Boromir", species: "human" }
])
16
Databases
hobbit. You can chain additional commands to the query if
you want to perform more operations. For example, you can
use the following query to change the value of the species
property for all hobbits:
r.table("fellowship").filter({species: "hobbit"})
.update({species: "halfling"})
ReQL even has a built-in HTTP command that you can use
to fetch data from public web APIs. In the following example,
we use the HTTP command to fetch the current posts from a
popular subreddit. The full query retrieves the posts, orders
them by score, and then displays several properties from the
top five entries:
r.http("http://www.reddit.com/r/aww.json")("data")("children")("data")
.orderBy(r.desc("score")).limit(5).pluck("score", "title", "url")
As you can see, ReQL is very useful for many kinds of ad hoc
data analysis. You can use it to slice and dice complex JSON
data structures in a number of interesting ways. If you’d like
to learn more about ReQL, you can refer to the API reference
documentation, the ReQL introduction on the RethinkDB
website, or the RethinkDB cookbook.
Use RethinkDB in Node.js and Express
Now that you’re armed with a basic working knowledge of
ReQL, it’s time to start building an application. We’re going
to start by looking at how you can use Node.js and Express to
make an API backend that serves the output of a ReQL query
to your end user.
The rethinkdb module in npm provides RethinkDB’s official JavaScript client driver. You can use it in a Node.js application to compose and send queries. The following example
shows how to perform a simple query and display the output
(Listing 2).
The connect method establishes a connection to RethinkDB. It returns a connection handle, which you provide to the
run command when you want to execute a query. The example above finds all of the halflings in the fellowship table
and then displays their respective JSON documents in your
console. It uses promises to handle the asynchronous flow of
execution and to ensure that the connection is properly closed
when the operation completes.
Let’s expand on the example above, adding an Express
server with an API endpoint that lets the user fetch all of the
fellowship members of the desired species (Listing 3).
If you have previously worked with Express, the code above
should look fairly intuitive. The final path segment in the URL
route represents a variable, which we pass to the filter command in the ReQL query in order to obtain just the desired documents. After the query completes, the application relays the
JSON output to the user. If the query fails to complete, then the
application will return status code 500 and provide the error.
Realtime updates with changefeeds
RethinkDB is designed for building realtime applications.
You can get a live stream of continuous query updates by ap-
www.JAXenter.com | April 2015
Listing 2
var r = require("rethinkdb");
r.connect().then(function(conn) {
return r.db("test").table("fellowship")
.filter({species: "halfling"}).run(conn)
.finally(function() { conn.close(); });
})
.then(function(cursor) {
return cursor.toArray();
})
.then(function(output) {
console.log("Query output:", output);
})
.error(function(err) {
console.log("Failed:", err);
});
Listing 3
var app = require("express")();
var r = require("rethinkdb");
app.listen(8090);
console.log("App listening on port 8090");
app.get("/fellowship/species/:species", function(req, res) {
r.connect().then(function(conn) {
return r.db("test").table("fellowship")
.filter({species: req.params.species}).run(conn)
.finally(function() { conn.close(); });
})
.then(function(cursor) { return cursor.toArray(); })
.then(function(output) { res.json(output); })
.error(function(err) { res.status(500).json({err: err}); })
});
Listing 4
r.connect().then(function(c) {
return r.db("test").table("fellowship").changes().run(c);
})
.then(function(cursor) {
cursor.each(function(err, item) {
console.log(item);
});
});
pending the changes command to the end of a ReQL query.
The changes command creates a changefeed, which will give
you a cursor that receives new records when the results of the
query change. The following code demonstrates how to use a
changefeed to display table updates (Listing 4).
The cursor.each callback executes every time the data within the fellowship table changes. You can test it for yourself
by making an arbitrary change. For example, we can remove
Boromir from the fellowship after he is slain by orcs:
17
Databases
r.table("fellowship").filter({name:"Boromir"}).delete()
When the query removes Boromir from the fellowship, the
demo application will display the following JSON data in stdout (Listing 5).
When changefeeds provide update notifications, they tell
you the previous value of the record and the new value of the
record. You can compare the two in order to see what has
changed. When existing records are deleted, the new value
is null. Similarly, the old value is null when the table receives
new records.
The changes command currently works with the following
kinds of queries: get, between, filter, map, orderBy, min, and
max. Support for additional kinds of queries, such as groupoperations, is planned for the future.
Listing 5
{
new_val: null,
old_val: {
id: '362ae837-2e29-4695-adef-4fa415138f90',
name: 'Boromir',
species: 'human'
}
}
Listing 6
r.db("test").tableCreate("players")
r.table("players").indexCreate("score")
r.table("players").insert([
{name: "Bill", score: 33},
{name: "Janet", score: 42},
{name: "Steve", score: 68}
...
])
Listing 7
var sockio = require("socket.io");
var app = require("express")();
var r = require("rethinkdb");
var io = sockio.listen(app.listen(8090), {log: false});
console.log("App listening on port 8090");
r.connect().then(function(conn) {
return r.table("scores").orderBy({index: r.desc("score")})
.limit(5).changes().run(conn);
})
.then(function(cursor) {
cursor.each(function(err, data) {
io.sockets.emit("update", data);
});
});
www.JAXenter.com | April 2015
A realtime scoreboard
Let’s consider a more sophisticated example: a multiplayer
game with a leaderboard. You want to display the top five
users with the highest scores and update the list in realtime as
it changes. RethinkDB changefeeds make that easy. You can
attach a changefeed to a query that includes theorderBy and
limit commands. Whenever the scores or overall composition
of the list of top five users changes, the changefeed will give
you an update.
Before we get into how you set up the changefeed, let’s start
by using the Data Explorer to create a new table and populate
it with some sample data (Listing 6).
Creating an index helps the database sort more efficiently
on the specified property – which is score in this case. At the
present time, you can only use the orderBy command with
changefeeds if you order on an index.
To retrieve the current top five players and their scores, you
can use the following ReQL expression:
r.db("test").table("scores").orderBy({index: r.desc("score")}).limit(3)
We can add the changes command to the end to get a stream
of updates. To get those updates to the frontend, we will use
Socket.io, a framework for implementing realtime messaging
between server and client. It supports a number of transport
methods, including WebSockets. The specifics of Socket.io
usage are beyond the scope of this article, but you can learn
more about it by visiting the official Socket.io documentation.
The code in Listing 7 uses sockets.emit to broadcast the
updates from a changefeed to all connected Socket.io clients.
On the frontend, you can use the Socket.io client library to
set up a handler that receives the updateevent:
var socket = io.connect();
socket.on("update", function(data) {
console.log("Update:", data);
});
That’s a good start, but we need a way to populate the initial
list values when the user first loads the page. To that end, let’s
extend the server so that it broadcasts the current leaderboard
over Socket.io when a user first connects (Listing 8).
The application uses the same underlying ReQL expression
in both cases, so we can store it in a variable for easy reuse.
ReQL’s method chaining makes it highly conducive to that
kind of composability.
To wrap up the demo, let’s build a complete frontend. To
keep things simple, I’m going to use Polymer’s data binding
system. Let’s start by defining the template:
<template id="scores" is="auto-binding">
<ul>
<template repeat="{{user in users}}">
<li><strong>{{user.name}}:</strong> {{user.score}}</li>
</template>
</ul>
</template>
18
Databases
It uses the repeat attribute to insert one li tag for each user.
The contents of the li tag display the user’s name and their
current score. Next, let’s write the JavaScript code (Listing 9).
The handler for the leaders event simply takes the data
from the server and assigns it to the template variable that
stores the users. The update handler is a bit more complex.
It finds the entry in the leaderboard that correlates with the
old_val and then it replaces it with the new data.
When the score changes for a user that is already in the
leaderboard, it’s just going to replace the old record with a
new one that has the updated number. In cases where a user
in the leaderboard is displaced by one who wasn’t there previously, it will replace one user’s record with that of another.
The code in Listing 9 above will properly handle both cases.
Of course, the changefeed updates don’t help us maintain
the actual order of the users. To remedy that problem, we
simply sort the user array after every update. Polymer’s data
binding system will ensure that the actual DOM representation always reflects the desired order.
Listing 8
var getLeaders = r.table("scores").orderBy({index: r.desc("score")}).limit(5);
r.connect().then(function(conn) {
return getLeaders.changes().run(conn);
})
.then(function(cursor) {
cursor.each(function(err, data) {
io.sockets.emit("update", data);
});
});
Now that the demo application is complete, you can test it
by running queries that change the scores of your users. In the
Data Explorer, you can try running something like:
r.table("scores").filter({name: "Bill"})
.update({score: r.row("score").add(100)})
When you change the value of the user’s score, you will see
the leaderboard update to reflect the changes.
Next steps
Conventional databases are largely designed around a query/
response workflow that maps well to the web’s traditional
request/response model. But modern technologies like WebSockets make it possible to build applications that stream updates in realtime, without the latency or overhead of HTTP
requests.
RethinkDB is the first open source database that is designed
specifically for the realtime web. Changefeeds offer a way to
build queries that continuously push out live updates, obviating the need for routine polling.
To learn more about RethinkDB, check out the official
documentation. The introductory ten-minute guide is a good
place to start. You can also check out some RethinkDB demo
applications, which are published with complete source code.
io.on("connection", function(socket) {
r.connect().then(function(conn) {
return getLeaders.run(conn)
.finally(function() { conn.close(); });
})
.then(function(output) { socket.emit("leaders", output); });
});
Listing 9
var scores = document.querySelector("#scores");
var socket = io.connect();
socket.on("leaders", function(data) {
scores.users = data;
});
socket.on("update", function(data) {
for (var i in scores.users)
if (scores.users[i].id === data.old_val.id) {
scores.users[i] = data.new_val;
scores.users.sort(function(x,y) { return y.score - x.score });
break;
}
});
www.JAXenter.com | April 2015
Ryan Paul is a developer evangelist at RethinkDB. He is also a Linux enthusiast and open source software developer. He was previously a contributing editor at Ars Technica, where he wrote articles about software
development.
19
© iStockphoto.com/enjoynz
Web
The five biggest challenges in application performance management
Performance fails and
how to avoid them
What is good news for sales, can be bad news for IT. Sudden spikes in application usage
need plenty of preparation, so before you unwittingly make any performance no-nos, here
are the five areas where you might be slipping up.
by Klaus Enzenhofer
Whether it was on Black Friday, Cyber Monday or just during general Christmas shopping, this year’s holidays have
proven that too many online shops were far from well prepared for big traffic on their website or mobile offering. The
great impact of end-user experience is an underestimated
aspect for the whole business. Application performance
management (APM) has come a long way in a few short
years, but despite the numerous solutions available in the
market, many businesses still struggle with fundamental
problems.
www.JAXenter.com | April 2015
With a view to the next stressful situations that will effect
company applications, business- and IT-professionals are
requested to evolve APM strategies to successfully navigate
multi-channels in a multi-connected world. The optimizing
of application performance to deliver high-quality, frictionless user experiences across all devices and all channels isn’t
easy, especially if you’re struggling with these heavy challenges:
1. Sampling
Looking at an aggregate of what traffic analytics tell you
about daily, weekly and monthly visits isn’t enough. And
20
Web
counting on a sampling of what users experience is also a
scary approach for sure. Having a partial view of what is happening across your IT systems and applications is akin to trying to drive a car when someone is blindfolding you.
Load Testing is essential and although it is an important
part of preparation for peak event times like Black Friday or
Christmas. Because it is no substitute for real user monitoring
and to ensure a good customer journey for every visitor, a
bundle of different methods is requested: Load testing, synthetic monitoring AND real user monitoring. Not only does
this limit your understanding of what’s happening across the
app delivery chain, it leads to the next major scare that organizations face.
2. Lessons learned about performance issues
It’s Black Friday at 11 a. m., the phone rings and your boss
screams: “Is our site down? Why are transactions slowing to a
crawl? The call center is getting overwhelmed with customer
questions why they can’t check out online – Fix it!” This is
the nightmare scenario that plays out too often, but it doesn’t
need to be that way.
For best results in performance and availability it’s a must
have for continuously real user monitoring of all transactions, 24 hours, 7 days each week. Only this will ensure you
will see any and all issues as they come up, before customers
are involved. Only this gives to your the ability to respond
immediately and head off a heart-stopping call about issues
that should have been avoided. If your customers are your
“early warning system” they will be frustrated and likely
start venting on social media – which can be incredibly
damaging your business' reputation. As a result frustrated
customers will move to a competitor and revenue will be
lost.
3. Problems identified, but no explanation
So you and your team can manage the first two challenges
without having a lot of trouble. But now you have to face
the next major hurdle. The Application Performance Monitoring shows you there’s a problem, but you can’t pinpoint
the exact cause. Combing through waterfall charts and
logs – especially while racing against the clock to fix a problem – can feel like looking for needles in haystacks. You
can’t get any solution and the hurdle seems insurmountable.
When every minute can mean tens of thousands of dollars
in lost revenue, the old adage “time is money” is likely to be
ringing in your ears.
But your IT doesn’t just need more data, it needs transparency from the end users into the data center, into the
applications and deep to the level of the individual code
line. It needs a look through a magnifying glass with a new
generation APM solution. Today, synthetic monitoring empowers businesses to detect, classify, identify and gather
information on root causes of performance issues, instant
triage, problem ranking and cause identification. “Smart
analytics” reduces hours of manual troubleshooting to a
matter of seconds. Not all APM tools are covering a deepdive analysis, so you need to test and check all your important needs.
www.JAXenter.com | April 2015
4. Third-parties – the unknown stars
You are flying blind if you can’t cover the impact of integrated third-party services and if you don’t have the control
of their SLA compliance. Modern applications execute code
on diverse edge devices, often calling elements from a variety of third-party services well beyond the view of traditional
monitoring systems. Sure, third-party services can improve
end-user experiences and deliver functionality faster than
standalone applications, but they have a dark side. They can
increase complexity and page weights and decrease site performance to actually compromise the end-user experience.
Not only that, when a third-party service goes down,
whether it’s a Facebook “like” button, the “cart” in an online
shop, ad or web analytics, IT is often faced with performance
issues that’s not their fault, and not within their view. Trouble
is inevitable if they cannot explain the reason for a bad performance or a crash on the website and frustration will effect
not only your end-users, but also your IT team.
5. The Cloud – performance in the dark
A global survey of 740 senior IT professionals found that
nearly 80 percent of interviewed persons said that they fear
cloud providers hide performance problems. Additionally, 63
percent of respondents indicated there was a need for more
meaningful and granular SLA metrics that are geared toward
ensuring the continuous delivery of a high quality end-user
experience.
In preparing an upcoming major sales campaign you’ve
done great work and you are confident, all your effort ensures your websites resist the rush. But when the big day
comes, it turns out that the load testing you’ve done with
your CDN isn’t playing out the way it was predicted – because they are getting hit with peak demand that wasn’t
reflected when they were in test mode. The inadequate tracking and responding in real-time shows exactly how a lack of
visibility effects and destroys any plan to make big money
with the sales event.
Whether you’ve launched a new app in a public cloud,
or in your virtualized data center, full visibility across all
cloud and on premise tiers – in one pane of glass – is the
only way to maintain control. In this way, you’ll be able to
detect regressions automatically and identify root cause in
minutes. Reflect and consider these APM best practices in
your daily job. The next shopping season is coming sooner
than expected.
Klaus Enzenhofer is a Technology Strategist and the Lead of the Dynatrace
Center of Excellence Team.
21
Web
Architectural differences in Vaadin web applications
Separating UI structure
and logic
In the first part of JAXenter’s series on Vaadin-based web apps, Matti Tahvonen shows us why every architectural decision has its pros and cons, and why the same goes for switching from Swing to Vaadin in your UI layer.
by Matti Tahvonen
As expressed by Pete Hunt (Facebook, React JS) at the
JavaOne 2014 Web Framework Smackdown, if you were to
create a UI toolkit from scratch, it would look nothing like
a DOM. Web technologies are not designed for application
development, but rich text presentation. Markup-based presentation has proven to be superior for more static content like
web sites, but applications are a different story. In the early
stages of graphical UIs on computers, UI frameworks didn’t
form “component based” libraries by accident. Those UI libraries have developed over decades, but the basic concept
of component based UI framework is still the most powerful
way to create applications.
And yet Swing, SWT, Qt and similar desktop UI frameworks have one major problem compared to web apps: they
require you to install special software on your client machine.
As we have all learned during the internet era, this can be
a big problem. Today’s users have lots of different kinds of
applications that they use and installing all of them (and especially maintaining them) will become a burden for your IT
department.
Browser plugins like Java’s Applet/Java WebStart support
(and Swing or JavaFX) and Flash are the traditional work­
arounds to avoid installing software locally for workstations.
But famous security holes in these, especially with outdated
software, may become a huge problem and your IT department will nowadays most likely be against installing any kind
of third party browser plugins. For them it is much easier
www.JAXenter.com | April 2015
to just maintain one browser application. This is one of the
fundamental reasons why pure web apps are now conquering
even the most complex application domains.
Welcome to the wonderful world of web apps
Even for experienced desktop developers it may be a huge
jump from the desktop world to web development. Developing web applications is much trickier than developing basic
desktop apps. There are lots of things that make things complicated, such as client-server communication, the markup
language and CSS used for display, new programming languages for the client side and client-server communication in
many different forms (basic HTTP, Ajax style requests, long
polling, WebSockets etc.). The fact is that, even with the most
modern web app frameworks, web development is not as easy
as building desktop apps.
Vaadin Framework is probably the closest thing to the
component based Swing UI development in the mainstream
web app world. Vaadin is a component based UI library that
tries to make web development as easy as traditional desktop development, maximizing developers' productivity and
the quality of the produced end user experience. In a Vaadin
application the actual UI logic, written by you, lives in the
server’s JVM. Instead of browser plugins, Vaadin has a builtin “thin client” that renders the UI efficiently in browsers. The
highly optimized communication channel sends only the stuff
that is really visible on the user’s screen to the client. Once the
initial rendering has been done, only deltas, in both ways, are
transferred between the client and the server.
22
Web
Architecture
Memory and CPU usage is centralized to server
The architecture of Vaadin Framework provides you with an
abstraction for the web development challenges, and most
of the time you can forget that you are building a web application. Vaadin takes care of handling all the communication, HTML markup, CSS and browser differences – you can
concentrate all your energy on your domain problems with a
clean Java approach and take advantage of your experience
from the desktop applications.
Vaadin uses GWT to implement its “thin client” running
in the browser. GWT is another similar tool for web development, and its heart is its Java to JavaScript “compiler”. GWT
also has a Swing-like UI component library, but in GWT the
Java code is compiled into JavaScript and executed in the
browser. The compiler supports only a subset of Java and the
fact that it is not running in JVM causes some other limitations, but the concepts are the same. Running your code in
the browser as a white box also has some security implications.
On the negative side is the fact that some of the computing
previously done by your user's workstation is now moved to
the server. The CPU hit is typically negligible, but you might
face some memory constraints without taking this fact into
account. On the other hand, the fact that the application
memory and processing happens now mostly on the server,
might be a good thing. The server side approach makes it
possible to handle really complex computing tasks, even with
really modest handheld devices. This is naturally possible
with Swing and a central server as well, but with the Vaadin
approach this comes as a free bonus feature.
A typical Vaadin business app consumes 50–500 kB of
server memory per user, depending on your application characteristics. If you have a very small application you can do
with a smaller number and if you reference a lot of data from
your UI, which usually makes things both faster and simpler,
you might need even more memory per user.
The per user memory usage is in line with e. g. Java EE standard JSF. If you do some basic math you can understand this
isn’t an issue for most typical applications and modern application servers. But, in case you create an accidental memory
leak in application code or carelessly load the whole database
table into memory, the memory consumption may become
an issue earlier than with desktop applications. Accidentally
referencing a million basic database entities from a user session will easily consume 100–200 MB of memory per session.
This might still be tolerable in desktop applications, but if you
have several concurrent users, you’ll soon be in trouble.
The memory issues can usually be rather easily solved by
using paging or by lazy loading the data from the backend to
the UI. Server capacity is also really cheap nowadays, so buying a more efficient server or clustering your application to
multiple application servers is most likely much cheaper than
making compromises in your architectural design. But in case
each of your application users need to do some heavy analysis
with huge in-memory data sets, web applications are still not
the way to go for your use case.
If your application's memory usage is much more important than its development cost (read: you are trying to write
the next GMail), Vaadin might not be the right tool for you.
If you still want to go to web applications, in this scenario
you should strive for completely (server) stateless application
and keep your UI logic in browsers. GWT is a great library for
these kinds of applications.
Additionally there are helpers to implement navigation between views and the management of master-detail interfaces.
One source of inspiration is Microsoft’s framework PRISM,
an application framework that provides many needed tools
for the development of applications.
One application instance, many users
The first thing you’ll notice is that you are now developing
your UI right next to your data. Pretty much all modern
business apps, both web and desktop apps, save their data
somehow to a central server. Often the data is “shielded”
a middleware layer (for example with EJBs). Now that you
move to Vaadin UI, the EJB, or whatever the technology you
use in your “backend”, is “closer”. It can often be run in the
very same application server as your Vaadin UI, making some
hard problems trivial. Using a local EJB is both efficient and
secure.
Even if you’d still use a separate application server for your
EJBs, they are most probably connected to UI servers using a
fast network that can handle chatty connection between UI
and business layers more efficiently than typical client server
communication – the network requirements by the Vaadin
thin client are in many cases less demanding, so your application can be used over e. g. mobile networks.
Another thing developers arriving from desktop Java to
Vaadin will soon notice is that fields with “static” keywords
are quite different in the server world. Many desktop applications use static fields as “user global” variables. For Java
apps running in server, they are “application global”, which
is a big difference. Application servers generally use a class
loader per web application (.war file), not class loader per
user session. For “user global” variables, use fields in your
UI class, Vaadin­Session, HttpSession or e. g. @SessionScoped
CDI bean.
Web applications in general will be much cheaper for IT departments to maintain. They have been traditionally run on a
company’s internal servers, but the trend of the era is hosting
them in PaaS services, in the “cloud”. Instead of maintaining the application in each user’s workstation, updates and
changes only need to be applied to the server. Also all data,
not just the shared parts, is saved on the server whose backups are much easier to handle. When your user’s workstation
breaks, you can just give him/her a replacement and the work
can continue.
www.JAXenter.com | April 2015
Matti Tahvonen works at Vaadin in technical marketing, helping the community be as productive as possible with Vaadin.
23
Java
A look at mvvmFX
Model View ViewModel
with JavaFX
The mvvmFX framework provides tools to implement the Model View ViewModel
design pattern with JavaFX. After one year of development a first stable 1.0.0
version has been released.
by Alexander Casall
The design pattern “Model View ViewModel” was first published by Microsoft for .Net applications and is nowadays
also used in other technologies like JavaScript frameworks.
As with other MV* approaches the goal is the separation between the structure of the user interface and the (UI-) logic.
To do this MVVM defines a ViewModel that represents the
state of the UI. The ViewModel doesn’t know the View and
has no dependencies to specific UI components.
Instead the View contains the UI components but no UI
logic and is connected with the ViewModel via Data Binding. Figure 1 shows a simple example of the preparation of a
welcome message in the ViewModel.
One of the benefits of this structure is that all UI state and
UI logic is encapsulated in a ViewModel that is independent
from the UI. But what is UI logic?
The UI logic defines how the user interface reacts to input
from the user or other events like changes in the domain model. For example, the decision whether a button should be active or inactive. Because of the independence from the UI, the
ViewModel can be tested with unit tests. In many cases there
is no need for complicated integration tests anymore where
the actual application is started and remotely controlled by
the test tool. This simplifies test-driven development significantly. Due to the availability of Properties and Data Binding
JavaFX is eminently suitable for this design pattern. mvvmFX
adds helpers and tools for the efficient and clean implementation of the pattern.
The following example will give an impression of the development process with MVVM. In this example there is a login
button that should only be clickable when the username and
the password are entered. Following TDD, the first step is to
create a unit test for the ViewModel (Listing 1).
After that the ViewModel can be implemented (Listing 2).
Now this ViewModel has to be connected with the View.
In the context of mvvmFX the “View” is the combination of
an fxml file and the related controller class. It is important to
www.JAXenter.com | April 2015
keep in mind that the JavaFX controller is part of the View
and should not contain any logic. Its only purpose is to create
the connection to the ViewModel (Listing 3).
Please note that the View has a generic type that is the related ViewModel type. This way mvvmFX can manage the
lifecycle of the View and the ViewModel.
Additional Features
The shown example uses FXML to define the structure of the
user interface. This is the recommended way for development
but mvvmFX supports traditional Views written with pure
Java code too. Another key aspect of the library is the support of Dependency Injection frameworks. This is essential
to be able to use the library in bigger projects. At the moment
there are additional modules provided for the integration
with Google Guice and JBoss Weld/CDI to allow for an easy
start with these frameworks. But other DI frameworks can be
easily embedded too.
mvvmFX was recently released in a first stable version
1.0.0. It is currently used for projects by worklplace Saxonia Systems AG. The framework is developed as open source
Listing 1
@Test
public void test(){
LoginViewModel viewModel = new LoginViewModel();
assertThat(viewModel.isLoginButtonDisabled()).isFalse();
viewModel.setUsername("mustermann");
assertThat(viewModel.isLoginButtonDisabled()).isFalse();
viewModel.setPassword("geheim1234");
assertThat(viewModel.isLoginPossible()).isTrue();
}
24
Java
Listing 2
public class LoginViewModel implements ViewModel {
private StringProperty username = new SimpleStringProperty();
private StringProperty password = new SimpleStringProperty();
private BooleanProperty loginPossible = new
SimpleBooleanProperty();
public LoginViewModel() {
loginButtonDisabled.bind(username.isEmpty().or(password.
isEmpty());
}
// getter/setter
}
Figure 1: Welcome message in ViewModel
Listing 3
public class LoginView implements FxmlView<LoginViewModel> {
@FXML
public Button loginButton;
// will be called by JavaFX as soon as the FXML bootstrapping is done
public void initialize(){
username.textProperty()
.bindBidirectional(viewModel.usernameProperty());
password.textProperty()
.bindBidirectional(viewModel.passwordProperty());
@FXML
public TextField username;
@FXML
public PasswordField password;
@InjectViewModel // is provided by mvvmFX
private LoginViewModel viewModel;
(Apache licence) and is hosted on GitHub. The authors welcome feedback, suggestions and critical reviews.
For the future development the focus lies on features that
are needed for bigger projects with complex user interfaces.
These include a mechanism with many ViewModels that can
access common data without the introduction of a mutual
visibility and dependency to each other (Scopes). Additionally
loginButton.disableProperty()
.bindBidirectional(viewModel.loginPossibleProperty());
}
}
there are helpers to implement navigation between views and
the management of master-detail interfaces.
Alexander Casall is a developer at Saxonia Systems AG, with a focus on
multi-touch applications using JavaFX.
Imprint
Publisher
Software & Support Media GmbH
Editorial Office Address
Software & Support Media
Saarbrücker Straße 36
10405 Berlin, Germany
www.jaxenter.com
Editor in Chief:
Editors:
Authors:
Sebastian Meyen
Coman Hamilton, Natali Vlatko
Chris Becker, Alexander Casall, Klaus Enzenhofer, Ghislain Mazars,
Ryan Paul, Nathan Rijksen, Matti Tahvonen
Copy Editor:
Jennifer Diener
Creative Director: Jens Mainz
Layout:
Flora Feher, Christian Schirmer
www.JAXenter.com | April 2015
Sales Clerk:
Anika Stock
+49 (0) 69 630089-22
[email protected]
Entire contents copyright © 2015 Software & Support Media GmbH. All rights reserved. No
part of this publication may be reproduced, redistributed, posted online, or reused by any
means in any form, including print, electronic, photocopy, internal network, Web or any other
method, without prior written permission of Software & Support Media GmbH.
The views expressed are solely those of the authors and do not reflect the views or position of their firm, any of their clients, or Publisher. Regarding the information, Publisher
disclaims all warranties as to the accuracy, completeness, or adequacy of any information, and is not responsible for any errors, omissions, inadequacies, misuse, or the consequences of using any information provided by Publisher. Rights of disposal of rewarded
articles belong to Publisher. All mentioned trademarks and service marks are copyrighted
by their respective owners.
25