Speak ALL the languages! RethinkDB and HBase Top
Transcription
Speak ALL the languages! RethinkDB and HBase Top
www.jaxenter.com Issue April 2015 | presented by #44 The digital magazine for enterprise developers Polyglots do it better Speak ALL the languages! Shutterstock’s polyglot enterprise and Komodo’s polyglot IDE RethinkDB and HBase Top performance fails Five challenges in performance management ©iStockphoto.com/epic11 A closer look at two database alternatives Editorial E pluribus unum Index It’s an old and rather out-of-fashion motto of the United States. E pluribus unum, one out of many. Early twentiethcentury ideals of the cultural melting pot may have failed in western society. But they may still work in IT. Developers will forever dream of one Holy Grail language that will rule them all. As nice as it sounds to be a full-stack developer, knowing your entire enterprise’s technology inside out, from the ListView failure modes to the various components’ linguistic syntaxes – that’s a near superhuman talent. As much as all developers would love to be fluently multilingual, in practice it’s difficult to keep up. But instead it’s the unity of the enterprise itself that can create one whole out of many languages, not the individual developer. In this issue, Chris Becker explains how Shutterstock’s gradual evolution from one to many languages was central to the success of the company’s technology, and how out of one language became many specialist development areas, in turn all unified by the enterprise. Meanwhile for web developers looking for one tool to rule all their front-end languages, we’ve got a helpful guide to the polyglot IDE Komodo, which just unveiled its ninth release. We’ve also got some useful introductions to HBase and RethinkDB (both big salary-earners according the recent Dice. com survey), as well as Vaadin web applications. And finally for anyone concerned with the speed of their website during high traffic, we have a couple of valuable lessons about how to avoid performance failures. Coman Hamilton, Editor Polyglot enterprises do it better 5 Shutterstock’s multilingual stack Chris Becker An introduction to polyglot IDE Komodo 7 One IDE to rule all languages Nathan Rijksen HBase, the “Hadoop database” 12 A look under the hood Ghislain Mazars An introduction to building realtime apps with RethinkDB 16 First steps Ryan Paul Performance fails and how to avoid them 20 The five biggest challenges in application performance management Klaus Enzenhofer Separating UI structure and logic 22 Architectural differences in Vaadin web applications Matti Tahvonen Model View ViewModel with JavaFX 24 A look at mvvmFX Alexander Casall www.JAXenter.com | April 2015 2 featuring October 12 – 14th, 2015 Business Design Centre, London MARK YOUR CALENDAR! www.jaxlondon.com Presented by Organized by Hot or Not HTML6 If you’re a web developer that doesn’t check the news that often, make sure you’re sitting before reading this. The web community is furiously debating a radical new proposal for HTML6. The general idea is that the next version of HTML will be developed in a way that allows it to dynamically run single-page apps without Java Script. Yes, a HTML that wants to compete with JavaScript. The community is torn, but proposal author Bobby Mozumder makes an interesting case, claiming that HTML needs to follow the “standard design pattern emerging via all the front-end JavaScript frameworks where content is loaded dynamically via JSON APIs.” He’s won over quite a few members of the community, but there’s no telling if the W3C will ever lower its eyebrows to this request. Cassandra salaries It’s often considered impolite to ask another programmer how much they earn. But that doesn’t mean colleagues, recruiters and employers aren’t picturing an estimated annual salary hovering over your head while they pretend to listen to you. It turns out that right now, Cassandra, PaaS and MapReduce pros are the ones with the biggest dollar signs above their heads (according to the latest Dice.com survey). Anyone lucky enough to be an expert in this area (and living in the US) will be making an average of at least $127k per year. America’s Java developers will be lucky to scrape anything above $100k, while poor JavaScript experts earn as little as $94k a year. We’d like to invite you to join us in performing the world’s smallest violin concerto for the JS community – because seriously that’s still a mighty shedload of cash to the kinds of people that write the texts you’re reading [long sigh]. How banks treat technology Publishing a book in IT In a recent series of interviews on JAXenter, developers in finance have told us about the pros and cons of developing for banks. And while the salary is very definitely a major pro (banks pay 33 percent to 50 percent more, says HFT developer Peter Lawrey), the management-level attitude to technology is less than favourable. “Many financial companies see their development units as a necessary evil,” says former NYSE programmer Dr. Jamie Allsop. Not only does technology come across as being second-class, but it’s often impossible to drive innovation when working on oldschool banking systems. “If it’s new then it must be risky,” dictates the typical financial ops attitude according to finance techno logy consultant Mashooq Badar. At some stage in their career, most developers will ponder the idea of publishing a book. And the appeal is understandable. It’s a great milestone on your CV, developers in the community may look up to you and you’ll be sure to make your parents proud (even if they don’t have a clue what you’re writing about). But don’t expect to make big money (like those Cassandra developers, above) when you self-publish on Leanpub. Meanwhile you’ll need to be dedicating much of your spare time to promoting the book. Before your eyes go lighting up with dollar signs at the thought of becoming a wealthy international superstar IT author, give yourself a quick reality check with a couple of Google searches on IT publishing. www.JAXenter.com | April 2015 4 © iStockphoto.com/jamtoons Languages Shutterstock’s multilingual stack Polyglot enterprises do it better Being a multilingual, multicultural company doesn’t just bring benefits on a level of corporate culture. Shutterstock search engineer Chris Becker explains why enterprises need to stop speaking just one language. by Chris Becker A technology company must consider the technology stack and programming language that it’s built on. Everything else flows from those decisions. It will define the kinds of engineers who are hired and how they will fare. During Shutterstock’s early days, a team of Perl developers built the framework for the site. They chose Perl for the benefits of CPAN and flexibility that language would bring. However, it opened up a new problem – we could only hire those familiar with Perl. We hired some excellent and skilled engineers, but many others were left behind because of their inexperience with and lack of exposure to Perl. In a way, it limited our ability to grow as a company. www.JAXenter.com | April 2015 But in recent years, Shutterstock has grown more “multilingual.” Today, we have services written in Node.js, Ruby, and Java; data processing tools written in Python; a few of our sites written in PHP; and apps written in Objective-C. Developers specialize in each language, but we communicate across different languages on a regular basis to debug, write new features, or build new apps and services. Getting from there to here was no easy task. Here are a few strategic decisions and technology choices that have facilitated our evolution: Service-oriented Architectures First, we built out all our core functionality into services. Each service could be written in any language while provid- 5 Languages “The biggest obstacle blocking new developers is often the technical bureaucracy needed to manage each runtime.” ing a language-agnostic interface through REST frameworks. It has allowed us to write separate pieces of functionality in the language most suited to it. For example, search makes use of Lucene and Solr, so Java made sense there. For our translation services, Unicode support is highly important, so Perl was the strongest option. Common Frameworks Between languages there are numerous frameworks and standards that have been inspired or replicated by one another. When possible, we try to use one of those common technologies in our services. All of our services provide RESTful interfaces, and internally we use Sinatra-inspired frameworks for implementing them (Dancer for Perl, Slim for PHP, Express for Node etc.). For templating we use django-inspired frameworks such as Template::Swig for Perl, Twig for PHP, and Liquid for Ruby. By using these frameworks we can help improve the learning curve when a developer jumps between languages. Runtime Management Often, the biggest obstacle blocking new developers is technical bureaucracy needed to manage each runtime – managing library dependencies, environment paths, and all the command line settings and flags needed to do common tasks. Shutterstock simplifies all of that with Rockstack. Rockstack provides a standardized interface for building, running, and testing code in any of its supported runtimes (currently: Perl, PHP, Python, Ruby, and Java). Not only does Rockstack give our developers a standard interface for building, testing, and running code, but it also supplies our build and deployment system with one standard set of commands for running those operations as well for any language. Rockstack is used by our Jenkins cluster for running builds and tests, and our home-grown deployment system makes use of it for launching applications in dev, QA, and production. Developer Meetups We also want our developers to learn and evolve their skillsets. We host internal meetups for Shutterstock’s Node developers, PHP Developers, and Ruby developers where they can get fresh looks and feedback on code in progress. These meetups are a great way for them to continue professional development and also to meet others who are tackling similar projects. There’s no technology to replace face-to-face communication, and what great ideas and methods can come from it. Openness We post all the code for every Shutterstock site and service on our internal GitHub. Everyone can find what we’ve been working on. If you have an idea for a feature, you can fork off a branch and submit a pull request to the shepherd of that service. This openness goes beyond transparency; it encourages people to try new things and to see what others are implementing. Our strategies and tools echo our mission. We want engineers to think with a language-agnostic approach to better work across multiple languages. It helps us to dream bigger and not get sidelined by limitations. As the world of programming languages becomes much more fragmented, it’s becoming more important than ever from a business perspective to develop multilingual-friendly approaches. We’ve come a long way since our early days, but there’s still a lot more we can do. We’re always reassessing our process and culture to make both the code and system more familiar to those who want to come work with us. This has been adapted from a post that originally ran on the Shutterstock Tech blog. You can read the original here: bits.shutterstock.com/2014/07/21/stop-using-one-language/. Testing Frameworks In order to create a standardized method for testing all the services we have running, we developed (and open sourced!) NTF (Network Testing Framework). NTF lets us write tests that hit special resources on our services’ APIs to provide status information that show the service is running in proper form. NTF supplements our collection of unit and integration tests by constantly running in production and telling us if any functionality has been impaired in any of our services. www.JAXenter.com | April 2015 Chris Becker is the Principal Engineer of Search at Shutterstock where he’s worked on numerous areas of the search stack including the search platform, Solr, relevance algorithms, data processing, analytics, internationalization, and customer experience. 6 Languages One IDE to rule all languages An introduction to polyglot IDE Komodo The developer world is becoming “decidedly more polyglot”, Komodo CTO recently told JAXenter. To cater to this changing community, the multi-lingual IDE Komodo is steadily increasing its number of supported languages. Let’s take a closer look at what it does best and how to get started. by Nathan Rijksen A few weeks ago, Komodo IDE 9 was released, featuring a host of improvements and features. Many of you have since downloaded the 21-day trial and have likely spent some time learning more about what this nimble beast can do. I thought now would be a good time to run you through some simple workflows and give you a general idea of how Komodo works. This is just a short introduction, and much more information can be found on the Komodo website under screencasts, forums and of course the documentation. Type in your search query, and Commando will show you results in real-time. Select an entry and press “Enter” to open/ activate it, or press tab/right arrow to “expand” the selection, allowing you to perform contextual actions on it. For example, you could rename a file right from Commando (Figure 3) using nothing but your keyboard. Commando doesn’t need to be accessed from the toolbar – you can use a shortcut to launch it so you can also be 100 percent keyboard driven (CTRL + Shift + O on Windows and Linux, or CMD + Shift + O in Mac). User Interface Quick Start Figure 1 shows what you’ll see when you first launch Komodo. The first few icons on the toolbar are pretty self e xplanatory, and the ones that aren’t immediately self-explanatory are easily discovered by hovering your mouse over them. You’ll find quick access to a lot of great Komodo features here: debugging, regex testing, source code control, macro recording/ playback, etc. Of course these aren’t all displayed by default; you would need a big screen to show all that. You can easily toggle individual buttons or button groups by right clicking the toolbar and selecting “Customize”. Three buttons you’ll likely be using a lot are the pane toggles (Figure 2). Clicking these allows you to toggle a bottom, left and right pane, each of which holds a variety of “widgets” that allow you to do anything from managing your project files to unit testing your project. You can customize the layout of these widgets by right-clicking their icon in the panel tab bar. At the far right of the toolbar you’ll find a search field, dubbed Commando, which lets you easily search through your project files, tools, bookmarks, etc. www.JAXenter.com | April 2015 Figure 1: Komodo start screen Figure 2: Pane buttons 7 Languages Figure 4: The “burger menu” Figure 3: Commando Starting a Project The first thing you’ll want to do is start a new project. You don’t need to use projects, but it’s highly encouraged as it gives Komodo some context: “This is what I’m working with, and this is how I like working with this particular project.” To start a new project, simply click the “Project” menu and then select “New Project”. On Windows and Linux the menus are at the far right of the toolbar under the “burger menu” (Figure 4), as people tend to lovingly call it. You can also start the project from the “Quick Launch” page which is visible if no other files are opened, or you can use Commando by searching for “New Project” and selecting the relevant result (it should show up first). When creating a project, Komodo will ask you where to save your project; this will be the project source, and Komodo will create a single project file in this folder and one project folder that can hold your project specific tools. You can separate your project from your source code by creating your project in the preferred location and afterwards modifying the project source in the “Project Preferences”. Once your project is created you might want to do just that: adjust some of the project specific preferences. To do this, open your “Project” menu again and select “Project Preferences” (Figure 5), or use Commando to search for “Project Properties”. There you can customize your project to your liking, changing it’s source, exclude files, using custom indentation, and more. It’s all there. If you’re working with a lot of different projects, you’ll definitely want to check out the Projects widget at the bottom of the Places widget. Open the left side pane and select the “Places” widget, then look at the bottom of the opened pane for “Projects”. Figure 5: Project properties Figure 6: Scope www.JAXenter.com | April 2015 8 Languages Figure 7: Open files Figure 8: Snippet Opening (and Managing) Files You now have your project and are ready to start opening and editing some files. The most basic way of doing that is by using the “Places” widget in the left pane, but where’s the fun in that? Again, you could use Commando to open your files… Simply launch Commando from your toolbar or via the shortcut and type in your filename. Commando by default searches across several “search scopes”, so you may get results for tools, macros, etc. If you want to focus down on just your files you can hit the icon to the left of the search field to select a specific “search scope”. In this case you’ll want the Files scope (Figure 6). You can define custom shortcuts to instantly launch a specific Commando scope, so you don't need to be using this scope menu with your mouse each time if that’s not your preferred method. Now that you’ve opened a couple of files, you may be starting to notice that your tab bar is getting a bit unwieldy. This is nothing new to editors, and a lot of programmers either deal with it or constantly close files that aren’t immediately relevant anymore. Some editors get around this by giving you another way of managing opened files; luckily Komodo is one of these editors, and even goes a step further. We’ve already spoken about Commando a lot so I’ll skip past the details and just say “There’s an Opened Files search scope.” A more UIdriven method is available through the “Open Files” widget (Figure 7), accessible under the left pane right next to the “Places” widget (you’ll have to click the relevant tab icon at the top of the pane). The Open Files widget allows you to, well, manage your open files. But more than that it allows you to manage how you manage open files – talk about meta! When you press the “Cog” icon you will be presented with a variety of grouping and sorting options, allowing you to group and sort your files the way you want. If you’re comfortable getting a bit dirty www.JAXenter.com | April 2015 with a custom JavaScript macro you can even create your own groups and sorting options. Using Snippets Many programmers use snippets (also referred to as Abbreviations in Komodo), and many editors facilitate this. Komodo again goes a step further by allowing very fine-tuned control of snippets. First let’s just use a simple snippet though. Open a new file for your preferred language. We’ll use PHP in our example (because so many people do, I’ll refrain from naming my personal favourite). Komodo comes with some pre-defined snippets you can use, but let’s define our own. Open your right pane and look for the “Samples” folder; these are the samples Komodo provides for you to get started with. I would suggest you cut and paste the “Abbreviations” folder residing in this folder to be at the root of your toolbox, as it provides a good starting structure. Once that’s done, head into Abbreviations | PHP; these are abbreviations that only get triggered when you are using a PHP file. Right-click the PHP folder and select Add | New Snippet. Here you can write your snippet code. Let’s write a “private function” snippet. Choose a name for your snippet: this will be the abbreviation that will trigger the snippet, and I’ll use “prfunc” because I like to keep it short and simple. Then write in your code. You can use the little arrow menu to the right of the editor to inject certain dynamic values. Most relevant right now is the “tabstop” value, which tells Komodo you want your cursor to stop and enter something there. Figure 8 shows what my final snippet looks like. You’ll note I checked “Auto-Abbreviation”. This will allow me to trigger the snippet simply by writing its abbreviation. The actual trigger for this is configurable, so you can instead have it trigger when you press Tab. Now we have our snippet and are ready to start using it. Simply write “prfunc”. Next, let’s talk about version control, wait ... What? You were 9 Languages create your own macros, which – aside from extending your editor – can be used to do anything from changing UI font size to writing your own syntax checker, or overhauling the entire UI. You have direct access to the full Komodo API that is used to develop Komodo. That’s a whole different topic though … One far too big to get into now, but be sure to have a look at all that the “Toolbox” offers you, because snippets are just the tip of the iceberg. Previewing Changes Figure 9: Debug Figure 10: Preview markdown So, you’ve created some files, edited them (using snippets I hope – that took a while to write down, you know!) and now want to see the results. Rather than leaving Komodo for your terminal or your browser, why not just do it from inside Komodo? Since it was a PHP file we were working on, you can simply launch your code by pressing the “Start or continue debugging” button in your toolbar. This will of course run your code through a debugger but it’s also useful just to run your code and see the output. You could skip the debugger altogether by using the Debug | Run Without Debugging menu. Assuming your PHP file actually outputs anything, Komodo will open the bottom pane and show you your code output. Output not what you expected it to be? Set a few breakpoints and jump right into the middle of your code (Figure 9). You could take it even further and start editing variables while debugging, starting a REPL from the current breakpoint, etc, provided the language you are using supports this functionality. I did say browser back there though, so what about previewing HTML files? Simply open your HTML file and hit the “Browser Preview” button in your toolbar. Select “In a Komodo Tab” for the “Preview Using” setting and customize the rest to your liking, then hit Preview. The preview will be rendered using Gecko. Komodo is built on Mozilla so basically it’s rendering using Firefox, right from Komodo. With Komodo 9, you can now even preview Markdown files in real-time. Just open a Markdown file and hit the “Preview Markdown” button in your toolbar to start previewing (Figure 10). Using Version Control Figure 11: VCS expecting more? No that’s it, you type “prfunc” and it will auto-trigger your snippet. It’s that easy. There’s many other ways of triggering and customizing snippets (ahem, Commando again); you can even create snippets that use EJS so you can add your own black magic to the mix. And for use cases where even that isn’t enough, you can www.JAXenter.com | April 2015 If you’re a programmer, you probably use version control of some kind (and if not – you really should!) whether it be Git, Mercurial, SVN, Perforce or whatever else strikes your fancy. Komodo doesn’t leave you hanging there; you can easily access a variety of VCS tasks right from inside Komodo (Figure 11). Let’s assume you already have a repository checked out/ cloned. We’ll go straight to the most basic and most frequently used VCS task – committing. Make your file edits, and simply hit the VCS toolbar button (you may need to enable this first by right clicking the toolbar and hitting “Customize”). Then select “Commit”, enter your commit message and fire it off. Komodo will show you the results (unless you disabled this) in the bottom pane. Komodo IDE 9 even shows you the changes to files as you are editing them, meaning it will show what was added, edited and deleted since your last commit of that file (Figure 12). 10 Languages “Appearance” and “Color Scheme” sections are probably of particular interest to you. Note that by default Komodo hides all the advanced preferences that you likely wouldn’t want to see unless you are a bit more familiar with Komodo. You can toggle these advanced preferences by checking the “Show Advanced” checkbox at the bottom left of your Preferences dialog (Figure 13). Inevitably some of you will find something that simply isn’t in Komodo though, because there isn’t a single IDE out there that has ALL the things for EVERYONE. When this happens, the community and customizability of Komodo is there for you, so be sure to check out the variety of addons, macros, color schemes etc. at the Komodo Resources website, and if you want to get creative, share ideas or request features from the community, then head on over to the Komodo forums. Just a Glimpse Figure 12: Track changes Hopefully I’ve given you a glimpse of what Komodo has to offer, or at least given you an idea of how to get started with it. Komodo is a great IDE, even if you aren’t in the market for a huge IDE, and the best thing about it is you don’t need to buy/launch another IDE each time you want to work on another language. Komodo has you covered for a variety of languages and frameworks, including (but not limited to!) Python, PHP, Go, Ruby, Perl, Tcl, Node.js, Django, HTML, CSS and JavaScript. Enjoy! Figure 13: Preferences The left margin (with your line numbers) will become colored when edits are made. Clicking on one of these colored blocks will open a small pop-in showing what was changed and allowing you to revert the change or share the changeset via kopy.io. Kopy.io is a new tool created by the Komodo team, which serves as a modernized pastebin, implemeting nifty new features you probably havent seen elsewhere such as client-side encrypted “kopies” and auto-sizing text to fit to your browser window. Customizing You’ve now gone through a basic workflow and are starting to really warm up to Komodo (hopefully), but you wish your color scheme was light instead of dark, or the icon set is too monotone and grey for your taste, or ... or ... Again Komodo’s got you covered. Head into the Preferences dialog via Edit | Preferences or Komodo | Preferences on OSX. Here you can customize Komodo to your heart’s content: the www.JAXenter.com | April 2015 Nathan Rijksen, Komodo developer, has web dev expertise and experience as a backend architect, application developer and database engineer, and has also worked with third-party authentication and payment modules. Nathan is a long time Komodo user and wrote multiple macros and extensions before joining the Komodo team. 11 Databases A look under the hood HBase, the “Hadoop database” Even MongoDB has its limits. And where the favourite of the database world reaches its limits in scalability, that’s just where HBase enters. Tech entrepreneur Ghislain Mazars shows us the strong points of this community-driven open-source database. by Ghislain Mazars HBase and the NoSQL Market: In the myriad of NoSQL databases today available on the market, HBase is far from having a comparable share to market leader MongoDB. Easy to learn, MongoDB is the NoSQL darling of most application developers. The document-oriented database interfaces well with lightweight data exchanges format, typically JSON, and has become the natural NoSQL database choice for many web and mobile apps (Figure 1). Where MongoDB (and more generally JSON databases) reaches its limits is for highly scalable applications requiring complex data analysis (the oft denominated “data-intensive” applications). That segment is the sweet spot of column-oriented databases such as HBase. But even in that particular category, HBase has lately oft been overlooked in favour of Cassandra. Quite a surprising turn of events actually as Facebook, the “creator” of Cassandra, ditched its own creation in 2011 and selected HBase as the database for its Messages application. We will come back to the technical differences between the two databases, but the main reason for Cassandra’s remarkable comeback is to be found elsewhere. It is worth noting however that in the process, Cassandra has lost a lot of its open-source nature. 80 percent of the committers on the Apache project are from Datastax and the management features beloved by enterprise customers are proprietary and part of DSE (“DataStax Enterprise”). Going one step further, the integration with Apache Spark, the new whizz-kid of Big Data, is currently only available as part of DSE … HBase, a community-driven open-source project Unlike Cassandra, HBase very much remains a communitydriven open-source project. No less than 12 companies are represented in the Apache project committee and the three Hadoop distributors, Cloudera, Hortonworks and MapR, Cassandra, the comeback kid of NoSQL databases With Cassandra, we find a pattern common to most major NoSQL databases, i.e. the presence of a dedicated corporate sponsor. Just as MongoDB (with MongoDB Inc, formerly called 10gen) and Couchbase (with Couchbase Inc.), the technical and market development of Cassandra is spearheaded by Datastax Inc. From a continued effort on documentation (Planet Cassandra) to the stewardship of the user community with meetups and summits, Datastax has been doing a remarkable job in waving high the Cassandra flag. These efforts have paid off, and Cassandra now holds the pole position among wide-column databases. www.JAXenter.com | April 2015 Figure 1: Relative adoption of NoSQL skills 12 Databases Figure 2: HBase schema design; source: Introduction to HBase Schema Design (Amandeep Khurana) share the responsibility of marketing the database and supporting its corporate users. As a result, HBase sometimes lacks the marketing firepower of one company betting its life on the product. If it had been the case, no doubt that HBase would be in a 1.x release by now: while Hadoop made a big jump from 0.2x to 1.0 in 2011, HBase continued to move steadily in the 0.9x range! And the three companies including the database in their portfolio show a tendency to privilege other (more proprietary) offerings of theirs and thus provide a restrictive image of HBase. In this context, it is quite an achievement that HBase occupies such an enviable place among NoSQL databases. It owes this position to its open-source community, strong installed base within web properties (Facebook, Yahoo, Groupon, eBay, Pinterest) and distinctive Hadoop connection. So in spite or maybe thanks to its unusual model, HBase could still very much win… As Cassandra has shown in the last 2/3 years, things can move fast in this market. But we will come back to that later on, for now, let us take a more technical look at HBase. Under the Hood Hadoop implementation of Google’s BigTable: HBase is an open-source implementation of BigTable as described in the 2005 paper from Google (http://research.google. com/archive/bigtable.html). Initially developed to store crawling data, BigTable remains the distributed database technology underpinning some of Google’s most famous services, Google Docs & Gmail. Of course, as should be expected from a creation of Google, the wide-column database is super scalable and works on commodity servers. It also features extremely high read performance, ensuring www.JAXenter.com | April 2015 for example that a Gmail user instantaneously retrieves all its latest emails. Just like BigTable, HBase is designed to handle massive amounts of data and is optimized for read-intensive applications. The database is implemented on top of the Hadoop Distributed File System (HDFS) and takes advantage of its linear scalability and fault tolerance. But the integration with Hadoop does not stop at using HDFS as the storage layer: HBase shares the same developer community as Hadoop and offers native integration with Hadoop MapReduce. HBase can serve as both the source or the destination of MapReduce jobs. The benefit here is clear: there is no need for any data movement between batch MapReduce ETL jobs and the host operational and analytics database. HBase schema design HBase offers advanced features to map business problems to the data model, which makes it way more sophisticated than a plain key-value store such as Redis. Data in HBase is placed in tables, and the tables themselves are composed of rows, each of which has a rowkey. The rowkey is the main entry point to the data: it can be seen as the equivalent of the primary key for a traditional RDBMS database. An interesting capability of HBase is that its rowkeys are byte arrays, so pretty much anything can serve as the rowkey. As an example, compound rowkeys can be created to mix different criteria into one single key, and optimize data access speed. In pure key-value mode, a query on the rowkey will give back all the content of the row (or to take a columnar view, all of its columns). But the query can also be much more precise, and specifically address (Figure 2): 13 Databases •A family of columns •A specific column, and as a result a cell which is the intersection of a row and a column •Or even a specific version of a cell, based on a timestamp Combined, these different features greatly improve the base key-value model. With one constraint, the rowkey cannot be changed, and should thus be carefully selected at design stage to optimize row-key access or scan on a range of rowkeys. But beyond that, HBase offers a lot of flexibility: new columns can be added on the fly, all the rows do not need to contain the same columns (which makes it easy to add new attributes to an existing entity) and nested entities provide a way to define relationships within what otherwise remains a very flat model. Cool Features of HBase Sorted rowkeys: Manipulation of HBase data is based on three primary methods: Get, Put, and Scan. For all of them, access to data is done by row and more specifically according to the rowkey. Hence the importance of selecting an appropriate rowkey to ensure efficient access to data. Usually, the focus will be on ensuring smooth retrieval of data: HBase is designed for applications requiring fast read performance, and the rowkey typically closely aligns with the application’s access patterns. As scans are done over a range of rows, HBase lexicographically orders rows according to their rowkeys. Using these “sorted rowkeys”, a scan can be defined simply from its start and stop rowkeys. This is extremely powerful to get all relevant data in one single database call: if we are only interested in the most recent entries for an application, we can concatenate a timestamp with the main entity id to easily build an optimized request. Another classical example relates to the use of geohashed compound rowkeys to immediately get a list of all the nearby places for a request on a geographic point of interest. Control on data sharding In selecting the rowkey, it is important to keep in mind that the rowkey strongly influences the data sharding. Unlike traditional RDBMS databases, HBase provides the application developer with control on the physical distribution of data across the cluster. Column families also have an influence (all column members for a family share the same prefix), but the primary criteria is the rowkey to ensure data is evenly distributed across the Hadoop cluster (data is sorted in ascending order by rowkey, column families and finally column key). As rowkeys determine the sort order of a table’s row, each region in the table ends up being responsible for the physical storage of a part of the row key space. Such an ability to perform physical-level tuning is a bit unusual in the database world nowadays, but immensely powerful if the application has a well-defined access pattern. In such cases, the application developer will be able to guide how the data is spread across the cluster and avoid any hotspotting by skillfully selecting the rowkey. And, at the end of the day, disk access speed matters from an application usability perspective, so it is really good to have some control on it! www.JAXenter.com | April 2015 “HBase tends to favour consistency over availability.” Strong consistency In its overall design, HBase tends to favour consistency over availability. It even supports ACID-level semantics on a perrow basis. This of course has an impact on write performance, which will tend to be slower than comparable consistent databases. But again, typical use cases for HBase are focused on a high read performance. Overall, the trade-off plays in favour of the application developer, who will have the guarantee that the datastore always (vs eventually...) delivers the right value of the data. In effect, the choice of delivering strong consistency frees the application developer from having to implement cumbersome mechanics at the application level to mimic such a guarantee. And it is always best when the application developer can focus on the business logic and user experience vs the plumbing ... What’s next for HBase? In the first section, we had a look at HBase's position in the wider NoSQL ecosystem, and vis-à-vis its most direct competitor, Cassandra. In our second and third sections, we reviewed the key technical characteristics of HBase, and highlighted some key features of HBase that make it stand out from other NoSQL databases. In this final section, we will discuss recent initiatives building out on these capabilities and the chances of HBase becoming a mainstream operational database in a Hadoop-dominated environment. Support for SQL with Apache Phoenix Until recently, HBase did not offer any kind of SQL-like interaction language. That limitation is now over with Apache Phoenix, an open-source initiative for ad hoc querying of HBase. Phoenix is an SQL skin for HBase, and provides a bridge between HBase and a relational model and approach to manipulate data. In practice, Phoenix compiles SQL queries to native HBase calls using another recent novelty of HBase, coprocessors. Unlike standard Hadoop SQL tools such as Hive, Phoenix can both read and write data, making it a more generic and complete HBase access tool. Further integration with Hadoop and Spark Over time, Hadoop has evolved from being mainly a HDFS + MapReduce batch environment to a complete data platform. An essential part of that transformation has been the advent of YARN, which provides a shared orchestration and resource management service for the different Hadoop components. With the delivery of project Slider end of 2014, 14 Databases Figure 3: Spark Hadoop integration HBase cluster resource utilisation can now be “controlled” from YARN, making it easier to run data processing jobs and HBase on the same Hadoop cluster. With a different spin, the ongoing integration work behind HBase and Spark also contributes to the unification of database operations and analytic jobs on Hadoop. Just as for MapReduce, Spark can now utilize HBase as both a data source and a target. With nearly 2/3 of users loading data into Spark via HDFS, HBase is the natural database to host low-latency, interactive applications from within a Hadoop cluster. Advanced analytics provided by Spark can be fed back directly into HBase, delivering a closed-loop system, fully integrated with the Hadoop platform (Figure 3). Final thoughts With Hadoop moving from exploratory analytics to operational intelligence, HBase is set to further benefit from its position as the “Hadoop database”. The imperative of limiting data movements will play strongly in its favour as enterprises start building complete data pipelines on top of their Hadoop “data lake”. In parallel, HBase is a strong contender for emerging use cases such as the management of IoT-related time series data. Incidentally, the recent launch by Microsoft of a HBase as a Service offering on Azure should be read in that context. For these reasons, there is no doubt that HBase will continue to grow steadily over the next few years. Still the opportunity is here for more, and for HBase to have a much bigger impact on the enterprise market. MapR has in this perspective recently made a promising move by incorporating its HBase-derived MapR-DB in its free community edition. For their part, Hortonworks and Cloudera have been active on the essential integrations with Slider and Spark. Now is the time for the HBase community and vendors to move to the next stage, and drive a rich enterprise roadmap for the “Hadoop database”, to make HBase sexy and attractive for mainstream enterprise customers! Ghislain Mazars is a tech entrepreneur and founder of Ubeeko, the company behind HFactory, delivering the application stack for Hadoop and HBase. He is fascinated by the wave of disruption brought by datadriven businesses, and the underlying big data technologies underpinning this shift. Advert www.JAXenter.com | April 2015 15 Databases First steps An introduction to building realtime apps with RethinkDB Built for scalability across multiple machines, the JSON document store RethinkDB is a distributed database that uses an easy query language. Here’s how to get started. by Ryan Paul RethinkDB is an open source database for building realtime web applications. Instead of polling for changes, the developer can turn a query into a live feed that continuously pushes updates to the application in realtime. RethinkDB’s streaming updates simplify realtime backend architecture, eliminating superfluous plumbing by making change propagation a native part of your application’s persistence layer. In addition to offering unique features for realtime application development, RethinkDB also benefits from some useful characteristics that contribute to a pleasant developer experience. RethinkDB is a schemaless JSON document store that is designed for scalability and ease of use, with easy sharding, support for distributed joins, and an expressive query language. This tutorial will demonstrate how to build a realtime web application with RethinkDB and Node.js. It will use Socket. io to convey live updates to the frontend. If you would like to follow along, you can install RethinkDB or run it in the cloud. First steps with ReQL The RethinkDB Query Language (ReQL) embeds itself in the programming language that you use to build your application. ReQL is designed as a fluent API, a set of functions that you can chain together to compose queries. Before we start building an application, let’s take a few minutes to explore the query language. The easiest way to experiment with queries is to use RethinkDB’s administrative console, which typically runs on port 8080. You can type RethinkDB queries into the text field on the Data Explorer tab and run them to see the output. The Data Explorer provides auto-completion and syntax highlighting, which can be helpful while learning ReQL. www.JAXenter.com | April 2015 By default, RethinkDB creates a database named test. Let’s start by adding a table to the testdatabase: r.db("test").tableCreate("fellowship") Now, let’s add a set of nine JSON documents to the table (Listing 1). When you run the command above, the database will output an array with the primary keys that it generated for all of the new documents. It will also tell you how many new records it successfully inserted. Now that we have some records in the database, let’s try using ReQL’s filter command to fetch the fellowship’s hobbits: r.table("fellowship").filter({species:"hobbit"}) The filter command retrieves the documents that match the provided boolean expression. In this case, we specifically want documents in which the species property is equal to Listing 1 r.table("fellowship").insert([ { name: "Frodo", species: "hobbit" }, { name: "Sam", species: "hobbit" }, { name: "Merry", species: "hobbit" }, { name: "Pippin", species: "hobbit" }, { name: "Gandalf", species: "istar" }, { name: "Legolas", species: "elf" }, { name: "Gimili", species: "dwarf" }, { name: "Aragorn", species: "human" }, { name: "Boromir", species: "human" } ]) 16 Databases hobbit. You can chain additional commands to the query if you want to perform more operations. For example, you can use the following query to change the value of the species property for all hobbits: r.table("fellowship").filter({species: "hobbit"}) .update({species: "halfling"}) ReQL even has a built-in HTTP command that you can use to fetch data from public web APIs. In the following example, we use the HTTP command to fetch the current posts from a popular subreddit. The full query retrieves the posts, orders them by score, and then displays several properties from the top five entries: r.http("http://www.reddit.com/r/aww.json")("data")("children")("data") .orderBy(r.desc("score")).limit(5).pluck("score", "title", "url") As you can see, ReQL is very useful for many kinds of ad hoc data analysis. You can use it to slice and dice complex JSON data structures in a number of interesting ways. If you’d like to learn more about ReQL, you can refer to the API reference documentation, the ReQL introduction on the RethinkDB website, or the RethinkDB cookbook. Use RethinkDB in Node.js and Express Now that you’re armed with a basic working knowledge of ReQL, it’s time to start building an application. We’re going to start by looking at how you can use Node.js and Express to make an API backend that serves the output of a ReQL query to your end user. The rethinkdb module in npm provides RethinkDB’s official JavaScript client driver. You can use it in a Node.js application to compose and send queries. The following example shows how to perform a simple query and display the output (Listing 2). The connect method establishes a connection to RethinkDB. It returns a connection handle, which you provide to the run command when you want to execute a query. The example above finds all of the halflings in the fellowship table and then displays their respective JSON documents in your console. It uses promises to handle the asynchronous flow of execution and to ensure that the connection is properly closed when the operation completes. Let’s expand on the example above, adding an Express server with an API endpoint that lets the user fetch all of the fellowship members of the desired species (Listing 3). If you have previously worked with Express, the code above should look fairly intuitive. The final path segment in the URL route represents a variable, which we pass to the filter command in the ReQL query in order to obtain just the desired documents. After the query completes, the application relays the JSON output to the user. If the query fails to complete, then the application will return status code 500 and provide the error. Realtime updates with changefeeds RethinkDB is designed for building realtime applications. You can get a live stream of continuous query updates by ap- www.JAXenter.com | April 2015 Listing 2 var r = require("rethinkdb"); r.connect().then(function(conn) { return r.db("test").table("fellowship") .filter({species: "halfling"}).run(conn) .finally(function() { conn.close(); }); }) .then(function(cursor) { return cursor.toArray(); }) .then(function(output) { console.log("Query output:", output); }) .error(function(err) { console.log("Failed:", err); }); Listing 3 var app = require("express")(); var r = require("rethinkdb"); app.listen(8090); console.log("App listening on port 8090"); app.get("/fellowship/species/:species", function(req, res) { r.connect().then(function(conn) { return r.db("test").table("fellowship") .filter({species: req.params.species}).run(conn) .finally(function() { conn.close(); }); }) .then(function(cursor) { return cursor.toArray(); }) .then(function(output) { res.json(output); }) .error(function(err) { res.status(500).json({err: err}); }) }); Listing 4 r.connect().then(function(c) { return r.db("test").table("fellowship").changes().run(c); }) .then(function(cursor) { cursor.each(function(err, item) { console.log(item); }); }); pending the changes command to the end of a ReQL query. The changes command creates a changefeed, which will give you a cursor that receives new records when the results of the query change. The following code demonstrates how to use a changefeed to display table updates (Listing 4). The cursor.each callback executes every time the data within the fellowship table changes. You can test it for yourself by making an arbitrary change. For example, we can remove Boromir from the fellowship after he is slain by orcs: 17 Databases r.table("fellowship").filter({name:"Boromir"}).delete() When the query removes Boromir from the fellowship, the demo application will display the following JSON data in stdout (Listing 5). When changefeeds provide update notifications, they tell you the previous value of the record and the new value of the record. You can compare the two in order to see what has changed. When existing records are deleted, the new value is null. Similarly, the old value is null when the table receives new records. The changes command currently works with the following kinds of queries: get, between, filter, map, orderBy, min, and max. Support for additional kinds of queries, such as groupoperations, is planned for the future. Listing 5 { new_val: null, old_val: { id: '362ae837-2e29-4695-adef-4fa415138f90', name: 'Boromir', species: 'human' } } Listing 6 r.db("test").tableCreate("players") r.table("players").indexCreate("score") r.table("players").insert([ {name: "Bill", score: 33}, {name: "Janet", score: 42}, {name: "Steve", score: 68} ... ]) Listing 7 var sockio = require("socket.io"); var app = require("express")(); var r = require("rethinkdb"); var io = sockio.listen(app.listen(8090), {log: false}); console.log("App listening on port 8090"); r.connect().then(function(conn) { return r.table("scores").orderBy({index: r.desc("score")}) .limit(5).changes().run(conn); }) .then(function(cursor) { cursor.each(function(err, data) { io.sockets.emit("update", data); }); }); www.JAXenter.com | April 2015 A realtime scoreboard Let’s consider a more sophisticated example: a multiplayer game with a leaderboard. You want to display the top five users with the highest scores and update the list in realtime as it changes. RethinkDB changefeeds make that easy. You can attach a changefeed to a query that includes theorderBy and limit commands. Whenever the scores or overall composition of the list of top five users changes, the changefeed will give you an update. Before we get into how you set up the changefeed, let’s start by using the Data Explorer to create a new table and populate it with some sample data (Listing 6). Creating an index helps the database sort more efficiently on the specified property – which is score in this case. At the present time, you can only use the orderBy command with changefeeds if you order on an index. To retrieve the current top five players and their scores, you can use the following ReQL expression: r.db("test").table("scores").orderBy({index: r.desc("score")}).limit(3) We can add the changes command to the end to get a stream of updates. To get those updates to the frontend, we will use Socket.io, a framework for implementing realtime messaging between server and client. It supports a number of transport methods, including WebSockets. The specifics of Socket.io usage are beyond the scope of this article, but you can learn more about it by visiting the official Socket.io documentation. The code in Listing 7 uses sockets.emit to broadcast the updates from a changefeed to all connected Socket.io clients. On the frontend, you can use the Socket.io client library to set up a handler that receives the updateevent: var socket = io.connect(); socket.on("update", function(data) { console.log("Update:", data); }); That’s a good start, but we need a way to populate the initial list values when the user first loads the page. To that end, let’s extend the server so that it broadcasts the current leaderboard over Socket.io when a user first connects (Listing 8). The application uses the same underlying ReQL expression in both cases, so we can store it in a variable for easy reuse. ReQL’s method chaining makes it highly conducive to that kind of composability. To wrap up the demo, let’s build a complete frontend. To keep things simple, I’m going to use Polymer’s data binding system. Let’s start by defining the template: <template id="scores" is="auto-binding"> <ul> <template repeat="{{user in users}}"> <li><strong>{{user.name}}:</strong> {{user.score}}</li> </template> </ul> </template> 18 Databases It uses the repeat attribute to insert one li tag for each user. The contents of the li tag display the user’s name and their current score. Next, let’s write the JavaScript code (Listing 9). The handler for the leaders event simply takes the data from the server and assigns it to the template variable that stores the users. The update handler is a bit more complex. It finds the entry in the leaderboard that correlates with the old_val and then it replaces it with the new data. When the score changes for a user that is already in the leaderboard, it’s just going to replace the old record with a new one that has the updated number. In cases where a user in the leaderboard is displaced by one who wasn’t there previously, it will replace one user’s record with that of another. The code in Listing 9 above will properly handle both cases. Of course, the changefeed updates don’t help us maintain the actual order of the users. To remedy that problem, we simply sort the user array after every update. Polymer’s data binding system will ensure that the actual DOM representation always reflects the desired order. Listing 8 var getLeaders = r.table("scores").orderBy({index: r.desc("score")}).limit(5); r.connect().then(function(conn) { return getLeaders.changes().run(conn); }) .then(function(cursor) { cursor.each(function(err, data) { io.sockets.emit("update", data); }); }); Now that the demo application is complete, you can test it by running queries that change the scores of your users. In the Data Explorer, you can try running something like: r.table("scores").filter({name: "Bill"}) .update({score: r.row("score").add(100)}) When you change the value of the user’s score, you will see the leaderboard update to reflect the changes. Next steps Conventional databases are largely designed around a query/ response workflow that maps well to the web’s traditional request/response model. But modern technologies like WebSockets make it possible to build applications that stream updates in realtime, without the latency or overhead of HTTP requests. RethinkDB is the first open source database that is designed specifically for the realtime web. Changefeeds offer a way to build queries that continuously push out live updates, obviating the need for routine polling. To learn more about RethinkDB, check out the official documentation. The introductory ten-minute guide is a good place to start. You can also check out some RethinkDB demo applications, which are published with complete source code. io.on("connection", function(socket) { r.connect().then(function(conn) { return getLeaders.run(conn) .finally(function() { conn.close(); }); }) .then(function(output) { socket.emit("leaders", output); }); }); Listing 9 var scores = document.querySelector("#scores"); var socket = io.connect(); socket.on("leaders", function(data) { scores.users = data; }); socket.on("update", function(data) { for (var i in scores.users) if (scores.users[i].id === data.old_val.id) { scores.users[i] = data.new_val; scores.users.sort(function(x,y) { return y.score - x.score }); break; } }); www.JAXenter.com | April 2015 Ryan Paul is a developer evangelist at RethinkDB. He is also a Linux enthusiast and open source software developer. He was previously a contributing editor at Ars Technica, where he wrote articles about software development. 19 © iStockphoto.com/enjoynz Web The five biggest challenges in application performance management Performance fails and how to avoid them What is good news for sales, can be bad news for IT. Sudden spikes in application usage need plenty of preparation, so before you unwittingly make any performance no-nos, here are the five areas where you might be slipping up. by Klaus Enzenhofer Whether it was on Black Friday, Cyber Monday or just during general Christmas shopping, this year’s holidays have proven that too many online shops were far from well prepared for big traffic on their website or mobile offering. The great impact of end-user experience is an underestimated aspect for the whole business. Application performance management (APM) has come a long way in a few short years, but despite the numerous solutions available in the market, many businesses still struggle with fundamental problems. www.JAXenter.com | April 2015 With a view to the next stressful situations that will effect company applications, business- and IT-professionals are requested to evolve APM strategies to successfully navigate multi-channels in a multi-connected world. The optimizing of application performance to deliver high-quality, frictionless user experiences across all devices and all channels isn’t easy, especially if you’re struggling with these heavy challenges: 1. Sampling Looking at an aggregate of what traffic analytics tell you about daily, weekly and monthly visits isn’t enough. And 20 Web counting on a sampling of what users experience is also a scary approach for sure. Having a partial view of what is happening across your IT systems and applications is akin to trying to drive a car when someone is blindfolding you. Load Testing is essential and although it is an important part of preparation for peak event times like Black Friday or Christmas. Because it is no substitute for real user monitoring and to ensure a good customer journey for every visitor, a bundle of different methods is requested: Load testing, synthetic monitoring AND real user monitoring. Not only does this limit your understanding of what’s happening across the app delivery chain, it leads to the next major scare that organizations face. 2. Lessons learned about performance issues It’s Black Friday at 11 a. m., the phone rings and your boss screams: “Is our site down? Why are transactions slowing to a crawl? The call center is getting overwhelmed with customer questions why they can’t check out online – Fix it!” This is the nightmare scenario that plays out too often, but it doesn’t need to be that way. For best results in performance and availability it’s a must have for continuously real user monitoring of all transactions, 24 hours, 7 days each week. Only this will ensure you will see any and all issues as they come up, before customers are involved. Only this gives to your the ability to respond immediately and head off a heart-stopping call about issues that should have been avoided. If your customers are your “early warning system” they will be frustrated and likely start venting on social media – which can be incredibly damaging your business' reputation. As a result frustrated customers will move to a competitor and revenue will be lost. 3. Problems identified, but no explanation So you and your team can manage the first two challenges without having a lot of trouble. But now you have to face the next major hurdle. The Application Performance Monitoring shows you there’s a problem, but you can’t pinpoint the exact cause. Combing through waterfall charts and logs – especially while racing against the clock to fix a problem – can feel like looking for needles in haystacks. You can’t get any solution and the hurdle seems insurmountable. When every minute can mean tens of thousands of dollars in lost revenue, the old adage “time is money” is likely to be ringing in your ears. But your IT doesn’t just need more data, it needs transparency from the end users into the data center, into the applications and deep to the level of the individual code line. It needs a look through a magnifying glass with a new generation APM solution. Today, synthetic monitoring empowers businesses to detect, classify, identify and gather information on root causes of performance issues, instant triage, problem ranking and cause identification. “Smart analytics” reduces hours of manual troubleshooting to a matter of seconds. Not all APM tools are covering a deepdive analysis, so you need to test and check all your important needs. www.JAXenter.com | April 2015 4. Third-parties – the unknown stars You are flying blind if you can’t cover the impact of integrated third-party services and if you don’t have the control of their SLA compliance. Modern applications execute code on diverse edge devices, often calling elements from a variety of third-party services well beyond the view of traditional monitoring systems. Sure, third-party services can improve end-user experiences and deliver functionality faster than standalone applications, but they have a dark side. They can increase complexity and page weights and decrease site performance to actually compromise the end-user experience. Not only that, when a third-party service goes down, whether it’s a Facebook “like” button, the “cart” in an online shop, ad or web analytics, IT is often faced with performance issues that’s not their fault, and not within their view. Trouble is inevitable if they cannot explain the reason for a bad performance or a crash on the website and frustration will effect not only your end-users, but also your IT team. 5. The Cloud – performance in the dark A global survey of 740 senior IT professionals found that nearly 80 percent of interviewed persons said that they fear cloud providers hide performance problems. Additionally, 63 percent of respondents indicated there was a need for more meaningful and granular SLA metrics that are geared toward ensuring the continuous delivery of a high quality end-user experience. In preparing an upcoming major sales campaign you’ve done great work and you are confident, all your effort ensures your websites resist the rush. But when the big day comes, it turns out that the load testing you’ve done with your CDN isn’t playing out the way it was predicted – because they are getting hit with peak demand that wasn’t reflected when they were in test mode. The inadequate tracking and responding in real-time shows exactly how a lack of visibility effects and destroys any plan to make big money with the sales event. Whether you’ve launched a new app in a public cloud, or in your virtualized data center, full visibility across all cloud and on premise tiers – in one pane of glass – is the only way to maintain control. In this way, you’ll be able to detect regressions automatically and identify root cause in minutes. Reflect and consider these APM best practices in your daily job. The next shopping season is coming sooner than expected. Klaus Enzenhofer is a Technology Strategist and the Lead of the Dynatrace Center of Excellence Team. 21 Web Architectural differences in Vaadin web applications Separating UI structure and logic In the first part of JAXenter’s series on Vaadin-based web apps, Matti Tahvonen shows us why every architectural decision has its pros and cons, and why the same goes for switching from Swing to Vaadin in your UI layer. by Matti Tahvonen As expressed by Pete Hunt (Facebook, React JS) at the JavaOne 2014 Web Framework Smackdown, if you were to create a UI toolkit from scratch, it would look nothing like a DOM. Web technologies are not designed for application development, but rich text presentation. Markup-based presentation has proven to be superior for more static content like web sites, but applications are a different story. In the early stages of graphical UIs on computers, UI frameworks didn’t form “component based” libraries by accident. Those UI libraries have developed over decades, but the basic concept of component based UI framework is still the most powerful way to create applications. And yet Swing, SWT, Qt and similar desktop UI frameworks have one major problem compared to web apps: they require you to install special software on your client machine. As we have all learned during the internet era, this can be a big problem. Today’s users have lots of different kinds of applications that they use and installing all of them (and especially maintaining them) will become a burden for your IT department. Browser plugins like Java’s Applet/Java WebStart support (and Swing or JavaFX) and Flash are the traditional work arounds to avoid installing software locally for workstations. But famous security holes in these, especially with outdated software, may become a huge problem and your IT department will nowadays most likely be against installing any kind of third party browser plugins. For them it is much easier www.JAXenter.com | April 2015 to just maintain one browser application. This is one of the fundamental reasons why pure web apps are now conquering even the most complex application domains. Welcome to the wonderful world of web apps Even for experienced desktop developers it may be a huge jump from the desktop world to web development. Developing web applications is much trickier than developing basic desktop apps. There are lots of things that make things complicated, such as client-server communication, the markup language and CSS used for display, new programming languages for the client side and client-server communication in many different forms (basic HTTP, Ajax style requests, long polling, WebSockets etc.). The fact is that, even with the most modern web app frameworks, web development is not as easy as building desktop apps. Vaadin Framework is probably the closest thing to the component based Swing UI development in the mainstream web app world. Vaadin is a component based UI library that tries to make web development as easy as traditional desktop development, maximizing developers' productivity and the quality of the produced end user experience. In a Vaadin application the actual UI logic, written by you, lives in the server’s JVM. Instead of browser plugins, Vaadin has a builtin “thin client” that renders the UI efficiently in browsers. The highly optimized communication channel sends only the stuff that is really visible on the user’s screen to the client. Once the initial rendering has been done, only deltas, in both ways, are transferred between the client and the server. 22 Web Architecture Memory and CPU usage is centralized to server The architecture of Vaadin Framework provides you with an abstraction for the web development challenges, and most of the time you can forget that you are building a web application. Vaadin takes care of handling all the communication, HTML markup, CSS and browser differences – you can concentrate all your energy on your domain problems with a clean Java approach and take advantage of your experience from the desktop applications. Vaadin uses GWT to implement its “thin client” running in the browser. GWT is another similar tool for web development, and its heart is its Java to JavaScript “compiler”. GWT also has a Swing-like UI component library, but in GWT the Java code is compiled into JavaScript and executed in the browser. The compiler supports only a subset of Java and the fact that it is not running in JVM causes some other limitations, but the concepts are the same. Running your code in the browser as a white box also has some security implications. On the negative side is the fact that some of the computing previously done by your user's workstation is now moved to the server. The CPU hit is typically negligible, but you might face some memory constraints without taking this fact into account. On the other hand, the fact that the application memory and processing happens now mostly on the server, might be a good thing. The server side approach makes it possible to handle really complex computing tasks, even with really modest handheld devices. This is naturally possible with Swing and a central server as well, but with the Vaadin approach this comes as a free bonus feature. A typical Vaadin business app consumes 50–500 kB of server memory per user, depending on your application characteristics. If you have a very small application you can do with a smaller number and if you reference a lot of data from your UI, which usually makes things both faster and simpler, you might need even more memory per user. The per user memory usage is in line with e. g. Java EE standard JSF. If you do some basic math you can understand this isn’t an issue for most typical applications and modern application servers. But, in case you create an accidental memory leak in application code or carelessly load the whole database table into memory, the memory consumption may become an issue earlier than with desktop applications. Accidentally referencing a million basic database entities from a user session will easily consume 100–200 MB of memory per session. This might still be tolerable in desktop applications, but if you have several concurrent users, you’ll soon be in trouble. The memory issues can usually be rather easily solved by using paging or by lazy loading the data from the backend to the UI. Server capacity is also really cheap nowadays, so buying a more efficient server or clustering your application to multiple application servers is most likely much cheaper than making compromises in your architectural design. But in case each of your application users need to do some heavy analysis with huge in-memory data sets, web applications are still not the way to go for your use case. If your application's memory usage is much more important than its development cost (read: you are trying to write the next GMail), Vaadin might not be the right tool for you. If you still want to go to web applications, in this scenario you should strive for completely (server) stateless application and keep your UI logic in browsers. GWT is a great library for these kinds of applications. Additionally there are helpers to implement navigation between views and the management of master-detail interfaces. One source of inspiration is Microsoft’s framework PRISM, an application framework that provides many needed tools for the development of applications. One application instance, many users The first thing you’ll notice is that you are now developing your UI right next to your data. Pretty much all modern business apps, both web and desktop apps, save their data somehow to a central server. Often the data is “shielded” a middleware layer (for example with EJBs). Now that you move to Vaadin UI, the EJB, or whatever the technology you use in your “backend”, is “closer”. It can often be run in the very same application server as your Vaadin UI, making some hard problems trivial. Using a local EJB is both efficient and secure. Even if you’d still use a separate application server for your EJBs, they are most probably connected to UI servers using a fast network that can handle chatty connection between UI and business layers more efficiently than typical client server communication – the network requirements by the Vaadin thin client are in many cases less demanding, so your application can be used over e. g. mobile networks. Another thing developers arriving from desktop Java to Vaadin will soon notice is that fields with “static” keywords are quite different in the server world. Many desktop applications use static fields as “user global” variables. For Java apps running in server, they are “application global”, which is a big difference. Application servers generally use a class loader per web application (.war file), not class loader per user session. For “user global” variables, use fields in your UI class, VaadinSession, HttpSession or e. g. @SessionScoped CDI bean. Web applications in general will be much cheaper for IT departments to maintain. They have been traditionally run on a company’s internal servers, but the trend of the era is hosting them in PaaS services, in the “cloud”. Instead of maintaining the application in each user’s workstation, updates and changes only need to be applied to the server. Also all data, not just the shared parts, is saved on the server whose backups are much easier to handle. When your user’s workstation breaks, you can just give him/her a replacement and the work can continue. www.JAXenter.com | April 2015 Matti Tahvonen works at Vaadin in technical marketing, helping the community be as productive as possible with Vaadin. 23 Java A look at mvvmFX Model View ViewModel with JavaFX The mvvmFX framework provides tools to implement the Model View ViewModel design pattern with JavaFX. After one year of development a first stable 1.0.0 version has been released. by Alexander Casall The design pattern “Model View ViewModel” was first published by Microsoft for .Net applications and is nowadays also used in other technologies like JavaScript frameworks. As with other MV* approaches the goal is the separation between the structure of the user interface and the (UI-) logic. To do this MVVM defines a ViewModel that represents the state of the UI. The ViewModel doesn’t know the View and has no dependencies to specific UI components. Instead the View contains the UI components but no UI logic and is connected with the ViewModel via Data Binding. Figure 1 shows a simple example of the preparation of a welcome message in the ViewModel. One of the benefits of this structure is that all UI state and UI logic is encapsulated in a ViewModel that is independent from the UI. But what is UI logic? The UI logic defines how the user interface reacts to input from the user or other events like changes in the domain model. For example, the decision whether a button should be active or inactive. Because of the independence from the UI, the ViewModel can be tested with unit tests. In many cases there is no need for complicated integration tests anymore where the actual application is started and remotely controlled by the test tool. This simplifies test-driven development significantly. Due to the availability of Properties and Data Binding JavaFX is eminently suitable for this design pattern. mvvmFX adds helpers and tools for the efficient and clean implementation of the pattern. The following example will give an impression of the development process with MVVM. In this example there is a login button that should only be clickable when the username and the password are entered. Following TDD, the first step is to create a unit test for the ViewModel (Listing 1). After that the ViewModel can be implemented (Listing 2). Now this ViewModel has to be connected with the View. In the context of mvvmFX the “View” is the combination of an fxml file and the related controller class. It is important to www.JAXenter.com | April 2015 keep in mind that the JavaFX controller is part of the View and should not contain any logic. Its only purpose is to create the connection to the ViewModel (Listing 3). Please note that the View has a generic type that is the related ViewModel type. This way mvvmFX can manage the lifecycle of the View and the ViewModel. Additional Features The shown example uses FXML to define the structure of the user interface. This is the recommended way for development but mvvmFX supports traditional Views written with pure Java code too. Another key aspect of the library is the support of Dependency Injection frameworks. This is essential to be able to use the library in bigger projects. At the moment there are additional modules provided for the integration with Google Guice and JBoss Weld/CDI to allow for an easy start with these frameworks. But other DI frameworks can be easily embedded too. mvvmFX was recently released in a first stable version 1.0.0. It is currently used for projects by worklplace Saxonia Systems AG. The framework is developed as open source Listing 1 @Test public void test(){ LoginViewModel viewModel = new LoginViewModel(); assertThat(viewModel.isLoginButtonDisabled()).isFalse(); viewModel.setUsername("mustermann"); assertThat(viewModel.isLoginButtonDisabled()).isFalse(); viewModel.setPassword("geheim1234"); assertThat(viewModel.isLoginPossible()).isTrue(); } 24 Java Listing 2 public class LoginViewModel implements ViewModel { private StringProperty username = new SimpleStringProperty(); private StringProperty password = new SimpleStringProperty(); private BooleanProperty loginPossible = new SimpleBooleanProperty(); public LoginViewModel() { loginButtonDisabled.bind(username.isEmpty().or(password. isEmpty()); } // getter/setter } Figure 1: Welcome message in ViewModel Listing 3 public class LoginView implements FxmlView<LoginViewModel> { @FXML public Button loginButton; // will be called by JavaFX as soon as the FXML bootstrapping is done public void initialize(){ username.textProperty() .bindBidirectional(viewModel.usernameProperty()); password.textProperty() .bindBidirectional(viewModel.passwordProperty()); @FXML public TextField username; @FXML public PasswordField password; @InjectViewModel // is provided by mvvmFX private LoginViewModel viewModel; (Apache licence) and is hosted on GitHub. The authors welcome feedback, suggestions and critical reviews. For the future development the focus lies on features that are needed for bigger projects with complex user interfaces. These include a mechanism with many ViewModels that can access common data without the introduction of a mutual visibility and dependency to each other (Scopes). Additionally loginButton.disableProperty() .bindBidirectional(viewModel.loginPossibleProperty()); } } there are helpers to implement navigation between views and the management of master-detail interfaces. Alexander Casall is a developer at Saxonia Systems AG, with a focus on multi-touch applications using JavaFX. Imprint Publisher Software & Support Media GmbH Editorial Office Address Software & Support Media Saarbrücker Straße 36 10405 Berlin, Germany www.jaxenter.com Editor in Chief: Editors: Authors: Sebastian Meyen Coman Hamilton, Natali Vlatko Chris Becker, Alexander Casall, Klaus Enzenhofer, Ghislain Mazars, Ryan Paul, Nathan Rijksen, Matti Tahvonen Copy Editor: Jennifer Diener Creative Director: Jens Mainz Layout: Flora Feher, Christian Schirmer www.JAXenter.com | April 2015 Sales Clerk: Anika Stock +49 (0) 69 630089-22 [email protected] Entire contents copyright © 2015 Software & Support Media GmbH. All rights reserved. No part of this publication may be reproduced, redistributed, posted online, or reused by any means in any form, including print, electronic, photocopy, internal network, Web or any other method, without prior written permission of Software & Support Media GmbH. The views expressed are solely those of the authors and do not reflect the views or position of their firm, any of their clients, or Publisher. Regarding the information, Publisher disclaims all warranties as to the accuracy, completeness, or adequacy of any information, and is not responsible for any errors, omissions, inadequacies, misuse, or the consequences of using any information provided by Publisher. Rights of disposal of rewarded articles belong to Publisher. All mentioned trademarks and service marks are copyrighted by their respective owners. 25