I have an obsession with data.

In this post I’ll show you what I did with data and Elixir, Poolboy, Mogrify, AndreaMosaic.

I normally attempt to use new technologies so I can learn from real problems.

Some years ago I did a Web crawler that scrapped the news from the most popular media websites in my city. I did a Facebook bot so people could read all the news in one place.

The first time I did the crawler, it was “OK.” I wrote it in Java and I learned that it was very difficult to deal with concurrency. For example, it is easy to introduce race condition issues in the code. In the end, I got it all to work, but it took a lot of time and effort.

Elixir

I’ve been trying to learn Elixir for the past two years. I learned about basic things like Pattern Matching, OTP, and macros but I hadn’t had a chance to do something from scratch. So, I decided to re-do the Java crawler, but this time using Elixir.

I won’t explain how I did it but I will tell the how it works and the tools I used.

Challenges

  • Read all links from the front page of the media website
  • Identify which links match with the pattern related to a single news item
  • Generate an object of type Article that had things like title, content, URL, etc
  • Save it to the database
  • Save the thumbnail in my computer
  • Resize the thumbnail
  • Do all this recursively for child nodes
  • Use Elixir concurrent workers for doing these tasks without exhausting my system resources

Libraries

  • These are some libraries I used to do this,
  • HTTPotion: This is an HTTP client
  • Floki: HTML parser
  • Ecto: A database wrapper and language integrated query for Elixir
  • Mogrify: Wrapper to use an awesome library called ImageMagick
  • Poolboy: Worker pool factory

How It Works

I used HTTPotion to serve the HTML for every single page. The first thing I did was to crawl the home page of the media site. Then, with the help of Floki, I got all the href attributes of every <a>.

The code looked something like this,

Once I had extracted the URLs, I looped through each of them and crawled for getting an object,

Noticed that I also save the slug of the title. This could help me later to identify each thumbnail’s name.

Once I got this object filled I could go ahead and save it in my database using Ecto.

In order to make our beautiful mosaic we need tons of images somewhere locally. I used HTTPotion again for getting the image from the thumbnail URL; I used Mogrify to resize the image.

Here is how I resized the image and saved it,

Once I had all this working, I needed to set a pool of Elixir workers so I my computer can do all this concurrent work without dying.

Here is where Poolboy comes in play. I used to configure a Supervisor which will have a series of workers available all the time.

Running the App

AndreaMosaic

AndreaMosaic is a free software that creates mosaic images for you and it’s really fast. I love this tool.
Here is a screenshot of how it looks,

To make it work, choose a background image and a folder where you will get every mosaic. You can specify whether to repeat mosaics, the size of the final image, etc. Give it a try, is really easy to use.

Conclusion

I’m really impressed of how easy it is to use Elixir. I highly recommend that you make something from scratch. It worked really well for me.

The Elixir community is still growing and this is the time to get onboard.

Resources