. We are now using GitHub’s issue tracker.
Facebook, Flickr and Stack Overflow are going in production many times per day and thus they do continuous delivery. Note : we can see that Stack Overflow’s version number is in the footer of pages.
Continuous delivery can’t be done from a snapshot version in order to allow reproducibility. So, the cycle is : run the tests, release a new version and deploy into the continuous delivery environment.
When the deployment is done manually, it must be automated. The deployed version must contain application binaries, differences between databases, as well as configuration for all environments (integration, acceptance test, production …). Environment detection must be automatic, by using an environment variable, a file located at a given place …
Management of database modifications
Axel advices to choose a tool managing database versions and execution order of scripts :
- Flyway is a free software for migration of databases. It records history of each played script. One can start from an existing database to generate the first version of scripts.
- Liquibase is another free software with similar features.
Management of features
Feature toggle allows to avoid creation of branches (and therefore merge of branches and duplication of continuous integration environments). It offen means adding a ‘if’ in the source code, which avoids breaking existing code. Old code will be removed later when everything will be ok in production.
Management of servers
In order to survive an eventual server crash and maintain a web site always on, 2 computers must be used.
While deploying in production, one computer is used for deployment of the new version, connected to the same database. Compatibility of the old server with the database must be ensured : instead of renaming a column, another one must be added together with a trigger copying the old column to the new one. The old column and the trigger will be dropped later. For a web application running under tomcat, usage of a shared session cache like memcached-session-manager allows to preserve active sessions. Each computer host a tomcat instance and a memcached instance. The switch to the new version is done totally transparently for users.
At Devoxx France 2013, I attended to conference titled « Demining an application with JRockit Mission Control » presented by François Ostyn.
- a JMX console
- a recorder of events happening in the JVM : Flight recorder
- a memory analyser to help find memory leaks : Memleak analyser
In order to maintain a single JVM, Oracle has created the HotRockit project to merge JRockit et Hotspot. Its results will be progressively integrated into Hotspot.
Since Java SE 7 update 4, Hotspot has same metrics as JRockit, which allows to use JRockit Mission Control with Hotspot. This one can be used 2 ways : with an eclipse plugin or with a command line.
Thanks to the eclipse plugin, Memleak analyser detects when the number of class instances is increasing and can visualize the call stack. After a run, Flight recorder can get event list in order to display heap and processor usage … In disconnected mode, it can also display memory leaks but not the graph of calls (visible only in connected mode).
The jcmd command (called jrcmd in JRockit) allows to run same queries as eclipse plugin. It can generate a .jfr file displayable with the eclipse plugin.
To finish, you must note that JRockit Mission Control is free for development but not for production and, according to François, it’s faster than profiling tools like Yourkit because metrics are implemented natively in the JVM.
Interest of a graph oriented database
Internet, social networks, communication networks and maven dependencies of a project are examples of graphs in reality. There are many efficient search algorithms with a linear complexity. Graphs have a simple structure : nodes linked by relations (oriented or not). In relational databases, one must add join tables while this concept is already part of a graph.
Neo4j is a graph oriented database, free in its community version (sources on github) and using Lucene for indexing. A node is represented by a list of key/value pairs. A relation is represented by a list of key/value pairs, a label and a direction. All writings are transactional (in Neo4j a transaction has ACID properties).
For searches, one must not start from identifier but use the index. Indeed, node identifiers can be recycled. Neo4j query language is named cypher and looks like this :
START person=node:Person('name: *') WHERE person.firstName = 'james' AND person.age > 35 RETURN person
About java API, you start by creating an instance of TraversalDescriptionImpl class and, via its interface TraversalDescription, you describe the query by giving the graph traversal strategy, together with searched nodes. Then, you run the defined query by giving a set of nodes as start points.
QPerson person = QPerson.person; CypherQueryDSL.start( node( "person", 1, 2, 3 ) ) .where( person.firstName.eq( "james" ).and( person.age.gt( 35 ) ) ) .returns( nodes( "person" ) )
The geographic (also called spatial) modelisation is possible in DBMS for many years but isn’t always part of the base package. Geographical information systems (GIS) provide base services on localisation (LBS for Location Based Services). Representation of physical world and Geomarketing are typical usages needing a background map associated with system specific informations.
It’s hard to represent a polygon in a DBMS column because it’s defined by points and vertexes, with 2 or 3 dimensions and often in a dense way.
The OpenGIS Consortium has created a complex layer for SQL, specified by more that 50 norms on GIS !
Among LBS platforms, the most famous is Google Maps providing an abstract format for addresses (europe, america and the city of tokyo don’t use the same format). This one manages browsers and handles localisation of addresses, display of streets and background map. Client has only to handle its specific datas on his servers, associated with a latitude and a longitude. It can be very heavy even on small intervals, because an index on longitude and latitude is needed.
Many algorithms exist to search for geographical datas in an optimal way : B-tree, Quadtree (partitioning of a bi-dimensional space), R-tree. With NoSQL, some databases handle spatial datas : Neo4j (R-tree and Quadtree), mongoDB (B-tree).
Hibernate Search Spatial is an Hibernate extension adding management of spatial datas. Indexing uses the Quadtree algorithm to split the set of values in a mesh more or less tight in order to quickly limit the sector of the search. This extension introduces the @Spatial annotation, to put on a class (entity), together with @Longitude and @Latitude annotations to put on entity fields defining a location.
To finish, Nicolas talks about the free database GeoNames, containing names for many places in all countries (city, street, lake …). As an example of usage, he suggests a mobile application using the current GPS location to find an hairdresser in the neighborhood.
Freud is a static analysis tool allowing to force application of some conventions and forbid usage of some libraries or API.
In classical projects, these conventions are written in a document or in a wiki. It’s only documentation that doesn’t really force application. Another alternative is to use Checkstyle but this one is not always easy to adapt to the needs and above all the code analysis is done lately (after compilation).
Freud provides a DSL (domain specific language) of java source code, usable for assertions in JUnit tests. Its analysis can be done at various levels : source code, class (by reflection API), bytecode …
Other kind of files are also supported : property files, CSS, text …
About usage, the github project must be forked and built because there is no maven artifact.
At Devoxx France 2013,
I attended to conference titled « From compileall.bat to the software factory for Java » presented by Guillaume Rams.
Here are some tools that could help in the lifecycle of a project :
- crap4j analyzes risks when changing code
- Sonar / Checkstyle / pmd :
Instead of directly using pmd and checkstyle, it’s better to use Sonar, a tool collecting metrics and generating reports on quality of a project. It doesn’t measure by itself but delegates this to specific tools.
The term “Smoke Tests” is used for the minimal set of tests that can be automated on a deployed version. Guillaume advices to etablish such tests, which will be better than nothing at all. For a web application, this could translate to “display the home page”.
To finish, Guillaume talks about Clinker, an ecosystem including a complete software factory for Java. There are 2 versions : Clinker Cloud for paid hosting in the cloud and Clinker Virtual Appliance, a freely downloadable virtual machine that can run in VMWare or in VirtualBox.
Vincent talks about best practices in the management of a Java project, taking his experience on XWiki as a model.
Stability of the API
Attention must be paid to users and even developers of the framework.
Clirr is a tool that breaks the build when there is a change in an API. The comparison can be done at binary or at source code level. When the API has been intentionally changed, it must be documented and the tool will ignore and report this.
Here are some usages for the management of XWiki APIs:
- Creation of a package ‘internal’: classes/methods/attributes in this package may be visible but should not be used by the Java developer. If he does though, he takes the risk that the code doesn’t compile or doesn’t execute anymore when the library is released. Do not forget to exclude this package from the javadoc and clirr analysis.
- Management of the depreciation in the API
- Current version : Use the @Deprecated annotation in the code and the @deprecated tag in the javadoc (also see Oracle documentation on the subject).
- Future version : Move deprecated code to artifacts whose name ends with “-legacy”. Deprecated classes are simply moved. Deprecated methods are moved using AspectJ.
- Management of a new API : It’s potentially unstable. Use the @Unstable annotation and specify a time limit.
Attention must be paid when adding methods to an interface ! But it won’t be a problem anymore starting from jdk8, which adds a default implementation to an interface.
Sometimes classes are duplicated at runtime (same class coming from 2 jars in the classpath). When classes are different, this is a real problem that can be solved using the following maven plugins :
- duplicate-finder to detect duplicate classes.
- enforcer to force applying some rules (version of the jdk, version of artifacts …).
The JaCoCo library measures test coverage and, if the value is below a certain threshold, the construction of the project fails.
Warning: Some tests can be non-deterministic and coverage can vary from one run to another, without changing the code. This is particularly the case when the ConcurrentHashMap class is used with multiple threads. Vincent proposed to add a parameter called threshold (see the discussion on JaCoCo project at github) to define a tolerance over the threshold.
False positive / negative
A false positive is a test that passes when it should fail. In contrast, a false negative is a test that doesn’t pass when he should pass. Example of false negative: crash, slow system. Here are some Jenkins plugins that can be used in these situations:
- Groovy Postbuild : This plugin allows you to control the result of a build and possibly add a badge on the build summary and/or on the history of builds. In our case, it can search text in the logs to verify that a test really succeed.
- Email-ext : This plugin extends the functionality of email notification. One could, for example, cancel sending email when the system is slow (but Jenkins web interface continues to show the problem).
- Scriptler : This plugin allows Groovy scripts to be centralized in a jenkins instance. You can also play a script on all jenkins jobs or on all nodes/slaves. Note: The plugin manages two sites sharing scripts : jenkins-scripts and scriptlerweb.
Over time, the curves of opened bugs and fixed bugs tend to diverge. To reduce this discrepancy, thursday was declared the bug fix day by XWiki project contributors. To reduce quickly opened bugs, they start by closing the fixed and duplicate bugs. Then they fix easy bugs and requalify some as change requests. To see the result, look at the XWiki project dashboard.
Finally, Vincent emphasizes the need to configure the tools to make the build fail when a measure is wrong. Without this (setting to issue a warning), one will tend to ignore the message and let the problem get worse and eventually never fix it.
Sacha starts by underlying the idea of the cloud is not new but Amazon has been the first to launch a platform named AWS (Amazon WebServices) and created by Jeff Barr. He continues by comparing power stations at the end of 19th century in France and nowadays cloud platforms :
- Power station <-> cloud provider
- Electricity grid <-> Internet
- Power plug <-> Internet browser
Thus, at the end of 19th century, Paris was divided in sectors defined by their power station, each having its norms (number of wires, single/two/three phase, voltage …). For more informations, consult the site about history of Paris electrification, with its maps showing division of the city. Nowadays, the cloud state is similar, with different norms and different providers.
Sacha attended a demonstration at a company distributing tests on personal mobiles (by controlling them during idle periods, the night for example). It’s a bit the SETI@home equivalent for personal computers.
Before, there was the stack operating system, virtual machine, java virtual machine, application server…
Now, with the cloud, following layers exist :
- IaaS (Infrastructure As A Service) : it’s the infrastructure (memory, number of processors/cores …). It requires a greater amount of engineering work because it doesn’t allow direct control of hardware -> it must be automated to ease the task
- SaaS (Software As A Service) : many clients are sharing the same server (it’s for example the case of SalesForce.com and gmail)
- PaaS (Plateform As A Service) : it’s addressed to developers and hides the IaaS layer. It allows to run customised applications.
The 3 layers are services, paid only when they are really used.
Sacha advises us to follow created instances because Amazon won’t do that for you. Indeed, each instance is charged, even if it’s not used. For that purpose, each instance must be associated with a task (jenkins build, server for such application ….).
According to Sacha, operating systems will become minimalists in the future. PaaS and SaaS will take place above these.
To finish, he does a demonstration of the cloudbees platform, which can host git repositories, instances of jenkins servers and, obviously, application servers. All that allows to deploy continuously, which means : build an application from source code (hosted in a git repository, for example), run tests (unit tests, integration tests …) and, if everything go well, deploy application on a server.