I have an obsession with data.
In this post I’ll show you what I did with data and Elixir, Poolboy, Mogrify, AndreaMosaic.
I normally attempt to use new technologies so I can learn from real problems.
Some years ago I did a Web crawler that scrapped the news from the most popular media websites in my city. I did a Facebook bot so people could read all the news in one place.

The first time I did the crawler, it was “OK.” I wrote it in Java and I learned that it was very difficult to deal with concurrency. For example, it is easy to introduce race condition issues in the code. In the end, I got it all to work, but it took a lot of time and effort.
Elixir
I’ve been trying to learn Elixir for the past two years. I learned about basic things like Pattern Matching, OTP, and macros but I hadn’t had a chance to do something from scratch. So, I decided to re-do the Java crawler, but this time using Elixir.
I won’t explain how I did it but I will tell the how it works and the tools I used.
Challenges
- Read all links from the front page of the media website
- Identify which links match with the pattern related to a single news item
- Generate an object of type Article that had things like title, content, URL, etc
- Save it to the database
- Save the thumbnail in my computer
- Resize the thumbnail
- Do all this recursively for child nodes
- Use Elixir concurrent workers for doing these tasks without exhausting my system resources
Libraries
- These are some libraries I used to do this,
- HTTPotion: This is an HTTP client
- Floki: HTML parser
- Ecto: A database wrapper and language integrated query for Elixir
- Mogrify: Wrapper to use an awesome library called ImageMagick
- Poolboy: Worker pool factory
How It Works
I used HTTPotion to serve the HTML for every single page. The first thing I did was to crawl the home page of the media site. Then, with the help of Floki, I got all the href attributes of every <a>.
The code looked something like this,
1 2 3 4 5 6 7 8 |
def extract_links(html) do html |> Floki.find("a") # get all <a> tags |> get_only_links() # get href attributes |> filter_links() # get only single new links |> Enum.uniq() # remove duplicate links end </a> |
Once I had extracted the URLs, I looped through each of them and crawled for getting an object,
1 2 3 4 5 6 7 8 9 10 |
%{title: title, content: content, thumbnail: thumbnail … etc} def get_article(html, url) do %ArticleStruct{ title: title(html), slug: Slugger.slugify_downcase(article.title, ?_), # slug content: content(html), url: url, thumbnail: thumbnail(html), } end |
Noticed that I also save the slug of the title. This could help me later to identify each thumbnail’s name.
Once I got this object filled I could go ahead and save it in my database using Ecto.
In order to make our beautiful mosaic we need tons of images somewhere locally. I used HTTPotion again for getting the image from the thumbnail URL; I used Mogrify to resize the image.
1 2 3 4 5 6 7 8 9 10 11 |
def save_image(article) do case HTTPotion.get(article.thumbnail) do %HTTPotion.Response{body: body} -> basepath = "/path/images/" filename = Path.join(basepath, "#{article.slug}.png") File.write!(filename, body) resize_image(filename, 200, 200) article _ -> nil end end |
Here is how I resized the image and saved it,
1 2 3 4 5 |
def resize_image(imagePath, width, height, _opts \\ []) do Mogrify.open(imagePath) |> Mogrify.resize_to_limit(~s(#{width}x#{height})) |> Mogrify.save(path: imagePath) end |
Once I had all this working, I needed to set a pool of Elixir workers so I my computer can do all this concurrent work without dying.
Here is where Poolboy comes in play. I used to configure a Supervisor which will have a series of workers available all the time.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
defmodule ScrapperApp.Application do @moduledoc false use Application defp poolboy_config do [ {:name, {:local, :worker}}, {:worker_module, ScrapperApp.MyWorker}, {:size, 3}, {:max_overflow, 4} ] end def start(_type, _args) do import Supervisor.Spec, warn: false children = [ :poolboy.child_spec(:worker, poolboy_config()), ] opts = [strategy: :one_for_one, name: Scrapper.Supervisor] Supervisor.start_link(children, opts) end end |
Running the App


AndreaMosaic
AndreaMosaic is a free software that creates mosaic images for you and it’s really fast. I love this tool.
Here is a screenshot of how it looks,

To make it work, choose a background image and a folder where you will get every mosaic. You can specify whether to repeat mosaics, the size of the final image, etc. Give it a try, is really easy to use.
Conclusion
I’m really impressed of how easy it is to use Elixir. I highly recommend that you make something from scratch. It worked really well for me.
The Elixir community is still growing and this is the time to get onboard.