BlobCity DB joins Hacktoberfest 2019

BlobCity DB is proud to join Hacktoberfest. We are here to help you contribute into our next generation database, with an opportunity to forever change the way enterprises store and process their data.

If you have never contributed to an open source project, but would like to contribute, then Hacktoberfest is the best time of the year to start your open source journey.

What is Hacktoberfest?

It is a month long celebration of open source, celebrated in the month of October. Hacking + October + Fest = Hacktoberfest.

During Hacktoberfest, open source project maintainers, guide first time contributors to start contributing in their respective repositories. Guidance includes offering understanding of an open source project, and helping with Github Pull Requests and helping abide by the best practices of the repository. In return, the contributor stands a chance to win some limited edition Hacktoberfest T-Shits from the organisers of the fest, which is DigitalOcean and Dev.

What is BlobCity DB

BlobCity DB is a blazing fast open source NoSQL database. It is commonly used as a Data Lake solution. We call it a database as it is more of a database than a Data Lake, but can pretty much do anything a Data Lake does. BlobCity DB specialises in storing 17 formats of data. It is ACID compliant, supports SQL, offers Java & Scala based stored procedures and allows storing of data on disk and in-memory.

How to Participate

First register on Hacktoberfest, choose an open source project, choose an open issue from within the open source project, solve the issue and submit your code in the form of a pull request. If the maintainer of the repository accepts your pull requests, you will have a qualified entry into the contest. A pull request being accepted is not important, but it should not be marked as inappropriate by the repository maintainer.

1. Register on Hacktoberfest

Register on Hacktoberfest with your Github account. If you don’t have a Github account, you can make one for free. Setup your Github profile, so that repository maintainers can know your background when you send them a pull request.


2. Choose an Issue

You can choose an open issue from BlobCity DB and start working on it. We have issues specially marked for Hacktoberfest. You may choose issues from any other open source project too.


3. Connect with us

We are happy to help in getting you started. Join our Slack community and ask us anything on Hacktoberfest & BlobCity DB.


4. Fork the repository

Once you have decided you want to solve a particular issue, fork the repository on Github.

5. Resolve & Send Pull Request

Now you can choose an issue of your choice, and work on it. Test your new code, and commit it to your fork once your are done. You can now submit your work by generating a pull requests from your fork to the main repository. Now hope that the maintainer of the repository likes your work and accepts your pull request. Remember, it is not necessary for your pull request to be merged for you to have a qualified entry in Hacktoberfest.

That’s how simple it is. Choose your open source project and start contributing today. It’s time to give back to the open source we all love 🙂

ICC Cricket World Cup 2019 Win Predictions Using AI & ML

At BlobCity we used our AI & ML skills to predict the ICC Cricket World Cup 2019 winners. Only time will tell if we are correct. We announced our predictions on 27 Jun, while India & West Indies Match #34 was in progress and we correctly predicted for India to be the winning team.

So who will win the trophy? Well here are our predictions for Match 34 to 45.

We did use multiple prediction models and in some cases where we got conflicting results, we have both teams equal points assuming a draw match.

We did Tweet the above predictions to give proof of time of prediction. Here is the Tweet: https://twitter.com/BlobCity/status/1144273488045232129

The final predicted points table, leaving out any extra points the team may get due to winning margins, is as below.

So, according to this, India will play England in the first semi-finals and the second semi-finals will happen between Australia and New Zealand. If we do have our predictions right, then Australia and India will win in their respective semi’s and play the final for the ICC Cricket World Cup.

How did we do it?

Watch the video from our Meetup

We used a neural network for predicting the winners. There is no human bias in the equation, just a computer program saying who will win. A simple feedforward NN is used. We tried others but this one specifically happened to give better results than other models we tired.

The above diagram shows the NN used. It is important to note that both the left and right side are input sides. The output is in the centre of the network. The network has 11 + 6 = 17 input nodes and 1 output node.

The output node simply predicts whether Team 1 will be win against Team 2. It is important to note how the input is captured.

Batting Input

The average strike rate of each batsman is computed across sets of 5 overs. The match is split into 10 sets of overs, comprising overs 1-5, 6-10, 11-15 and so on till 46-50. This is done so as the performance of a team towards the end of the match is considered more critical than the performance towards the beginning of the match.

Most matches in World Cup are close as the teams are really good. It is the last few overs that make all the difference and that is why it is important to capture team and player behaviour in these last overs. It is also a good indication of how the players perform under pressure.

We computed a weighted strike rate. This means that if a batsman hit a four or a six towards the end of the inning, we would give them a few extra runs against hitting a four or a six at the beginning of the match.

We took 11 batsman of each team, arranged them in descending order of their strike rate, and then subtracted Team 1 Batsman 1 strike rate from Team 2 Batsman 1 strike rate and then fed the difference into the first node of the batting neural network. We did this so on and so forth for all the 11 batsman in the team.

Bowling Input

Similar to batting we split bowlers into buckets of 5 overs each. We computed the bowling economy of each bowler. A wicket and maiden over towards the end of the match improved their economy, while an extra such as no ball or wide given significantly decreases their bowling economy.

We computed the bowling economy of all bowlers, and took six best bowlers. These six are arranged in increasing order of their economy. The best bowler is on the one with the lowest value of bowling economy. We did Team 1 Bowler 1 economy minus the Team 2 Bowler 1 economy and fed the difference to the first input node, and so on for 6 bowlers in the team. Most teams have only 5 bowlers, but some do have a 6th bowler who is good. This is why we considered 6 bowlers as input. If some team has 7 bowlers, we are ignoring performance of the worst bowler in the team and taking only the top 6 bowlers as input into the NN.


We used performance of these players across Match #1 to Match #33 of the ICC World Cup 2019. The output of the neural network is 1 if Team 1 will win and 0 if Team 2 will win. The NN was trained using the outcome of the first 33 matches and then used to predict Match #34 to Match #45 and the semi-finals and the finals.

The final prediction says that India will win ICC Cricket World Cup 2019.


Twitter post as proof of time: https://twitter.com/BlobCity/status/1148507778287165445

We were perfect in our predictions of the Top 4 teams: India, New Zealand, Australia and England.

Since the competing teams are slightly different than our predicted semi finals results, our updated predictions are as shown below:

Delete large files from historic commits in Git

git filter-branch --index-filter 'git rm -r --cached --ignore-unmatch <file>' HEAD

It can be tricky to remove a large file from git that goes several commits in the history, specifically when there are other source files in the same commits that you want to retain. The above command will do just this for you.

Caution: You will land up changing the hashes of all your commits that have the file in it. There maybe a massive conflict resolution that you may have to do after this command, but if you have to remove the file, this might just be the best way to get it done.

You are likely required to make this change, when Git is preventing pushes due to exceeding the file limit.

remote: error: File my_binary.tar.gz is 300.08 MB; this exceeds GitHub's file size limit of 100.00 MB

Use the git filter-branch command mentioned at top by replacing <file> with the location within the repository of the file you want to delete. You can also specify a folder if a complete folder is to be deleted.

Before you run the command make sure the <file> is currently deleted from the current branch. Also ensure that the delete of the file is committed within the repository even if the delete operation could not be pushed to remote.

Example Run

git filter-branch --index-filter 'git rm -r --cached --ignore-unmatch my_binary.tar.gz' HEAD
Rewrite f1824be80ff0ff2bd27064094245288252fc479a (71/118) (3 seconds passed, remaining 1 predicted)    rm 'my_binary.tar.gz'
Rewrite dc4a60df97e753e98804cc432339a6b4675ba9e7 (71/118) (3 seconds passed, remaining 1 predicted)    rm 'my_binary.tar.gz'
Rewrite 09a87540b07fa5ba0c9dc0ba1861720f51cb5e98 (71/118) (3 seconds passed, remaining 1 predicted)    rm 'my_binary.tar.gz'
Rewrite e1be7aa558591bb06049808161fca0c97071165c (71/118) (3 seconds passed, remaining 1 predicted)    rm 'my_binary.tar.gz'
Rewrite 1925fa84c42e379f8dd1d44cbda18da52ac5091e (71/118) (3 seconds passed, remaining 1 predicted)    rm 'my_binary.tar.gz'
Rewrite b96b54fecbf7dec09274ef30990995bf4a97288a (71/118) (3 seconds passed, remaining 1 predicted)    rm 'my_binary.tar.gz'
Rewrite 75d364753f247d983e83ab408eaa0af2f2fff5a1 (71/118) (3 seconds passed, remaining 1 predicted)    rm 'my_binary.tar.gz'
Rewrite b839c3c1dd0e6277083b74801629e9138591be49 (95/118) (4 seconds passed, remaining 0 predicted)    
Ref 'refs/heads/master' was rewritten

Post running the command, attempt a push to remote. It will very likely not work. If it does not it is expected. Take a pull, resolve any CONFLICT that shows up, make sure the file you deleted is still deleted and if not delete it from your folder, commit the changes and push. Your push should now work.

What Marvel earned from 22 superhero films

Marvel has launched 22 Superhero films since 2008, with Avengers Endgame being the last of them all. But how much did Marvel spend in making them and how much did they earn from it?

Marvel had a total production budget of around US $4.5 billion for all the 22 movies combined with the largest production budget of $316M – $400M for Avengers: Infinity War. The Avengers: End Game had slightly lesser production budget than its prequel, but has earned much more in the box office than any of the last 22 Marvel films. If the numbers have it right, they made upwards of $20 billion in box office collections, making them a profit of at least $15.5 billion!

We did analyse the data further to identify the most profitable superhero. We took single superhero films and compared their average profits against each other. Computing the total profit was unfair as Spiderman would take the win as they have 9 movies on the character.

Guardians of the Galaxy, considered as a single super-hero group, has made Marvel an average profit of $818.53M which is the highest across all independent super-hero films. However Guardians of the Galaxy is not exactly a single super-hero. From amongst solo super-hero movies, Iron Man tops the charts with $807.97M average profit per film.

Data Source



Remake Game of Thrones Season 8

Game of Thrones Season 8 saw the worst reviews ever, but it also happens to have the highest viewership numbers. What’s going on?

The apparent final season of Game of Thrones saw exceptionally poor reviews. The worst ever across all seasons of GoT so far. However interestingly we see that the viewership for all episodes of Season 8 have surpassed the viewership numbers of the seasons before them. Season 8 Episode 1, saw nearly 1.65 million additional viewers in USA alone, than Season 7 Episode 1. The increase in viewership remains consistent for episodes 2,3,4 & 5 of Season 8.

We can see from the chart that episode 5 of Season 8, saw the highest viewership numbers ever in USA. 12.48 million viewers, is a record for any of the episodes of GoT so far. This also happens to be the very episode after which petitions were written to HBO to remake Season 8 with more competent writers.


If they do remake it and the viewership numbers of Season 9, or Season 8B as they might call it, do turn out to be higher than Season 8A; we very well would have set a new trend. Screw up the last season of a popular series, and we will in more numbers watch the remake, thereby making the franchise more money.

This raises the question. Will HBO remake Season 8? A lot of the fans really hope they do. And if they do, would the viewership numbers be higher or lower?

Data Source: https://en.wikipedia.org/wiki/Game_of_Thrones#Viewer_numbers

Collect results from multiple promises in NodeJS

Promises can be complicated by themselves and may take time for a programmer who is otherwise comfortable with a procedural style of programming.

In certain cases, you maybe required to not just executed one promise, but execute multiple promises and collectively process the results from each of the promises. This article provides an example of the same.

let promises = [];
let numbers = [1,2,3,4,5];
numbers.forEach(number => {
let promise = new Promise(function(resolve, reject){
    resolve (number * number);

Promise.all(promises).then(squares => console.log(JSON.stringify(squares)));

The above program performs a square of each of the numbers in an array, but each square operation is executed asynchronously. The return of the asynchronous operation is got using a promise.

Each promise is created within an array and added to a promises array.

Promise.all() function is called that invokes and awaits execution of all the promises. Once every promise is resolved (or rejected), the result of each promise is collected into an array, in our case this is the “squares” variable.

The output of the above program looks as shown below


You can notice that the output is in the same order as the input numbers. Rather it is in the same order as the “promises” array. The Promise.all() function ensures that ordering of responses is maintained.

Let us take another example.

let promises = [];
let numbers = [-1,0,1,2,3,4,5];
numbers.forEach(number => {
let promise = new Promise(function(resolve, reject){
    if(number < 0) reject('Only positive numbers accepted');
    else resolve (number * number);
  }).catch(err => console.log(err));

Promise.all(promises).then(squares => console.log(JSON.stringify(squares)));

The above program rejects the scenario for squaring a negative number. Let’s see what the output looks like in this case.

Only positive numbers accepted

The first log line comes from the “catch” condition in the promise itself. The second line shows output after Promise.all(). We can see that we have a “null” value for square of -1, as the promise rejected a negative number.

The ordering of responses in the array is still maintained even if any of the promise is rejected. The array will contain a “null” value for promises that either reject or ones that resolve with a “null” value.


Why does NodeJS scale insanely?

If you are new to NodeJS, you might have heard that NodeJS is single threaded. And you might have also heard that it is insanely scalable, serving millions of users at realtime. How does a single threaded application scale so well?

Single threading is half the truth

Yes, NodeJS follows a Single Thread Event Loop Model, but it is not actually single threaded. It works on an event based execution architecture.

NodeJS has a main thread and additional worker threads. Tasks that do not have to be serviced synchronously can be passed onto the worker threads. When worker threats are ready to be executed, they report back to the event loop. The event loop picks up an event and passes it to the main program stack for being the next in line for execution.

This provides a single threaded, but sudo parallel execution environment.

Understanding NodeJS Execution

const request = require('request');
let f1 = function() {
  console.log('Hello at beginning');
  request('https://google.com', (err, res, body) => {
    console.log('Hello from function');
  console.log('Hello at end');


If we executed the above code in a procedural manner, we would expect the following output.

Hello at beginning
Hello from function
Hello at end

However your NodeJS application will show the following output.

Hello at beginning
Hello at end
Hello from function

Why is this so? Why does the request() line execute after the last console.log() statement? This is so because invoking request() is an asynchronous task. The execution of this task gets allotted to a worker thread. While the worker thread waits to get the response from google.com, the main thread can continue with further execution. This results in the last console output being printed while the worker thread is waiting for a response on the request.

When the worker thread does receive a response, it puts an entry into the event loop. When the main thread is free and doing nothing else, it picks up an event from the event loop and executes the tasks that was allotted to the worker. The event loop tasks are only executed when the main thread is free and not performing any other task.

NodeJS Async Execution
Call to request() passed on to a worker thread

So why is NodeJS insanely scalable?

This unique event based model prevents NodeJS from being blocked by any specific event. Each event is treated and processed independent of each other. This is only true as long as you don’t write code that blocks the main event thread.

Since async function calls report back to an event loop for execution when they are ready to be executed, the main thread is always busy doing something and never waiting on any task. A properly designed NodeJS application, can thereby keep the main event loop free from long running tasks, by passing long running tasks to worker threads.

This concept is very different than spawning new threads for executing tasks in parallel. There is a physical limit to the number of threads a system can execute. When this limit is reached, if individual threads are waiting for a long running operation to complete, all threads would essentially wait, thereby making the complete application slow.

On the contrary, in NodeJS, the main event loop only gets those tasks to execute that are ready to be executed. Thereby millions of concurrent events can be created, without affecting the performance of the main thread, thereby allowing for significant scalability of applications that are well designed.

NodeJS is turning out to be one of the preferred backend systems for web applications and web services.

Count number of elements in Iterator – Java

The most elegant way to get the size of an Iterator or the count of number of elements in an Iterator is by using the utility method provided within the Guava library.

int size = Iterators.size(myIterator);

myIterator must be an implementation of Iterator<T>. Here T can be of any type.

Note: The iterator will be consumed when the size is returned, so the same iterator may not be used for any other purpose.

Getting Guava Library

Library Source on GitHub


  <!-- or, for Android: -->


dependencies {
  compile 'com.google.guava:guava:27.0-jre'
  // or, for Android:
  api 'com.google.guava:guava:27.0-android'