Today we are releasing a new version of CUBRID driver for Nodej.s. node-cubrid 2.1.0 has a few API improvements which aim to provide more convenient APIs with full backward compatibility. However, the main objective of this release is to refactor the driver to achieve significant performance improvement by leveraging the characteristics of V8 and Node.js platforms. Continue reading to learn more about the changes.
Improvements
I will explain the changes chronologically.
- We have added Tavis Continuous Integration support to node-cubrid project. Now all 268K assertions are automatically performed on Travis servers upon every commit. Check out the latest node-cubrid bulid status at https://travis-ci.org/CUBRID/node-cubrid/builds.
Through Travis we can now guarantee that node-cubrid works on all major Node.js releases as well as CUBRID Server releases. This node-cubrid 2.1.0 is even compatible with the latest CUBRID 8.4.4 version we released last week.
To install CUBRID on Travis servers we use CUBRID Chef Cookbook. - Besides the Travis CI integration, we have added Coveralls Code Coverage support. Now whenever someone pushes the commit, Coveralls will provide a percentage change in code coverage, whether the code change has made a contribution to code coverage or instead decreased it. Very convenient and encouraging plarform to write more tests. To see the current code coverage status of node-cubrid project, visit https://coveralls.io/r/CUBRID/node-cubrid. At this moment we provide 89% code coverage. Now we exactly know how many tests we need to add more and which lines of the code we need to test.
- For those users who come from MySQL, we have added alias functions like
end()
in node-mysql for node-cubrid'sclose()
function. Same goes forcreateConnection()
alias forcreateCUBRIDConnection()
. Now if you migrate to node-cubrid, there is less code you need to change in your application. - Now createConnection() accepts an object of connection parameters. After all JavaScript is all about objects.
- Connection timeout parameter can now be supplied to createConnection().
query()
function now acceptsparams
object which you can use to pass an array of values to bind to?
placeholders in the provided SQL query._sqlFormat()
function in Helpers object, which is used to format an SQL query by replacing?
placeholder with respective bind values, is now smarter.- Now numbers are passed to the query as they are without being wrapped in single quotes as if they were strings, though users can wrap them if necessary.
- If you pass a
Date
object, it will correctly be converted into CUBRID compatibleDATETIME
string format which can be stored inDATE
,DATETIME
, andTIMESTAMP
columns.
Performance Improvement through Major Code Refactoring
As I mentioned at the beginning of this post, the core objective of this release is to refactor the driver to improve the performance of the driver. There are several refactoring works we have performed in this release.
- Major refactoring of buffer parsers which handle server responses.
- Major refactoring of protocol packet wrters to optimize the work with the
Buffer
. - Complete replacement of the Queries Queueing logic.
Buffer parsers refactoring
Prior to this 2.1.0 release, node-cubrid had many duplicate codes. All functions which initiate network communication with the CUBRID Server used to implement almost same functionality to read the data from the socket and prepare it for parsing. Though each function does need a separate logic to parse the buffer data, some functionality can be abstracted like reading bytes from the socket, and performing the basic preparation of the data before it is passed to the parser.
There is one more thing we have improved in buffer parsers: it is the work with the instances of Node.js Buffer
class. The idea is that memory allocation through Buffer is not as fast as through a local heap of V8 (the JavaScript engine Node.js runs on top of). Moreover, resizing an existing buffer is quite expensive. There is a great inspirational video by Trevor Norris from Mozilla Working with Node.js Buffers and Streams [Youtube] which cover this topic. I highly recommend watching it. Prior to 2.1.0, whenever node-cubrid received a chunk of data from the server, it concatenated this chunk to the main buffer object of this particular request. Since we do not know the potential size of the incoming stream of data, we cannot preallocate enough buffer memory to avoid buffer resizing. This resulted in constant creation, resizing and copying of buffers upon every arrival of the data from the server.
In node-cubrid 2.1.0 we have resolved both of these issues: refactored buffer parsers completely to remove any code duplication and improved the work with the Buffer
. Now, all functions which initiate network communication use the same API to read data from the socket and prepare it for parsing.
To resolve the second issues, we started leveraging the features of CUBRID Communication Protocol. In CUBRID Protocol when a server sends a data packet to a client, the first four bytes (the length of an integer type) represents the length of the incoming data packet. Thus, after receiving the first piece of the data packet, we can learn how many bytes will be necessary to keep the entire packet in a single Buffer
instance. Knowing this value, we can create a Buffer
instance with enough memory and start reading the remaining data from the pipe into this Buffer
instance. This way we avoid buffer resizing completely.
Refactoring the protocol packet wrters
Just like with Buffer Readers (or Packet Readers) node-cubrid has Buffer Writers which we call Packet Writers. Since the Packet Writers also write into a buffer and send it over the wire, the same rule applies: we needed to optimize how writers work with the Buffer. Unlike with reading data sent by the server, when writing data into the buffer to send it to the server, we know exactly the length of this data. So, why not create the buffer with enough memory in advance? That's what we did. In 2.1.0, we create a Buffer instance only once for a request and write the entire payload into it, thus avoid buffer resizing.
Refactoring Queries Queueing logic
The third major refactoring affected how queryies queueing work in node-cubrid. We have introduced queries queueing in version 2.0.0. At the time it was implemented via the setTimer() function. Every X period the internal logic would check if there is a query the user wants to run. Obviously this creates three problems for us:
- one that forces us to manage the queue and check if there is anything in the queue;
- second there is a potential loss of time in between the queries. Imagine the case when the query A has started in time X and is completed in X + 60, while the timer will check the queue only at X + 100. Why lose these previous 40 units of the time?
- the third issue is that there is a potential problem when a user can confuse and call the original
query()
/execute()
functions instead of dedicatedaddQuery()
/addNonQuery()
queue related functions which would result in an error that tells that another query is already running.
To address these issues, we have completely replaced the old queueing logic with a new, more efficient and lightweight one which does not incur any delay. To leverage the new queueing mechanism we had to process all user request, which initiate a network communication with the server, though a queue. Prior to 2.1.0 only those queries which were added by addQuery()
/addNonQuery()
functions were processed by the queue. Now, everything is processed by the queue even requests to close the query, commit the transaction or fetch more data.
By processing all requests by one common queueing mechanism allows us to avoid "another query is already running" errors all together. Since everything goes through the queue, there is no way two requests are run at the same time within one connection. The new queue processes the pending request the moment the previous one is completed. Thus there is no any delay in executing the query. Moreover, this helps us remove the headache of managing and checking the timer. Now the query queueing logic is very efficient. If you check the source of the Queue.js module, you will notice that it's only 48 lines of code including comments and empty lines.
Other minor refacroring
There are a few other minor refactoring work we have also done. For example, JavaScript provides two very convenient functions like call()
and apply()
which you can use to alter the reference to the current context of the function. However, if you use it too often, it affects the performance. Previously, in node-cubrid there were quite many usages of these functions. Now we do not call them at all for the sake of better performance.
We have also refactored most functions by removing unnecessary and repetitive logic. For the project we have also refactored the test suites by gathering all of them in a single test directory.
Soon we will publish the performance test results which will compare the previous 2.0.2 release with this new 2.1.0 release. To see the full list of changes, refer to CHANGELOG.md file.
If you have any questions or feature requests, feel free to leave your comments below or create an issue in Github.