Putting It All to Work: Elixir, Poolboy, Mogrify, AndreaMosaic

I have an obsession with data.

In this post I’ll show you what I did with data and Elixir, Poolboy, Mogrify, AndreaMosaic.

I normally attempt to use new technologies so I can learn from real problems.

Some years ago I did a Web crawler that scrapped the news from the most popular media websites in my city. I did a Facebook bot so people could read all the news in one place.

The first time I did the crawler, it was “OK.” I wrote it in Java and I learned that it was very difficult to deal with concurrency. For example, it is easy to introduce race condition issues in the code. In the end, I got it all to work, but it took a lot of time and effort.

Elixir

I’ve been trying to learn Elixir for the past two years. I learned about basic things like Pattern Matching, OTP, and macros but I hadn’t had a chance to do something from scratch. So, I decided to re-do the Java crawler, but this time using Elixir.

I won’t explain how I did it but I will tell the how it works and the tools I used.

Challenges

  • Read all links from the front page of the media website
  • Identify which links match with the pattern related to a single news item
  • Generate an object of type Article that had things like title, content, URL, etc
  • Save it to the database
  • Save the thumbnail in my computer
  • Resize the thumbnail
  • Do all this recursively for child nodes
  • Use Elixir concurrent workers for doing these tasks without exhausting my system resources

Libraries

  • These are some libraries I used to do this,
  • HTTPotion: This is an HTTP client
  • Floki: HTML parser
  • Ecto: A database wrapper and language integrated query for Elixir
  • Mogrify: Wrapper to use an awesome library called ImageMagick
  • Poolboy: Worker pool factory

How It Works

I used HTTPotion to serve the HTML for every single page. The first thing I did was to crawl the home page of the media site. Then, with the help of Floki, I got all the href attributes of every <a>.

The code looked something like this,

def extract_links(html) do
   html
   |&gt; Floki.find("a")  # get all <a> tags
   |&gt; get_only_links() # get href attributes
   |&gt; filter_links()   # get only single new links
   |&gt; Enum.uniq()      # remove duplicate links
end
</a>

Once I had extracted the URLs, I looped through each of them and crawled for getting an object,

 %{title: title, content: content, thumbnail: thumbnail … etc} 
def get_article(html, url) do
   %ArticleStruct{
     title: title(html),
     slug: Slugger.slugify_downcase(article.title, ?_), # slug
     content: content(html),
     url: url,
     thumbnail: thumbnail(html),
   }
end

Noticed that I also save the slug of the title. This could help me later to identify each thumbnail’s name.

Once I got this object filled I could go ahead and save it in my database using Ecto.

In order to make our beautiful mosaic we need tons of images somewhere locally. I used HTTPotion again for getting the image from the thumbnail URL; I used Mogrify to resize the image.

def save_image(article) do
   case HTTPotion.get(article.thumbnail) do
       %HTTPotion.Response{body: body} -&gt;
         basepath = "/path/images/"
         filename = Path.join(basepath, "#{article.slug}.png")
         File.write!(filename, body)               
         resize_image(filename, 200, 200)
         article
       _ -&gt; nil
   end
 end

Here is how I resized the image and saved it,

def resize_image(imagePath, width, height, _opts \\ []) do
   Mogrify.open(imagePath)
   |&gt; Mogrify.resize_to_limit(~s(#{width}x#{height}))
   |&gt; Mogrify.save(path: imagePath)
end

Once I had all this working, I needed to set a pool of Elixir workers so I my computer can do all this concurrent work without dying.

Here is where Poolboy comes in play. I used to configure a Supervisor which will have a series of workers available all the time.

defmodule ScrapperApp.Application do
@moduledoc false
 use Application
 defp poolboy_config do
   [
     {:name, {:local, :worker}},
     {:worker_module, ScrapperApp.MyWorker},
     {:size, 3},
     {:max_overflow, 4}
   ]
 end
 def start(_type, _args) do
   import Supervisor.Spec, warn: false
   children = [
     :poolboy.child_spec(:worker, poolboy_config()),     
   ]
   opts = [strategy: :one_for_one, name: Scrapper.Supervisor]
   Supervisor.start_link(children, opts)
 end
end

Running the App

AndreaMosaic

AndreaMosaic is a free software that creates mosaic images for you and it’s really fast. I love this tool.
Here is a screenshot of how it looks,

To make it work, choose a background image and a folder where you will get every mosaic. You can specify whether to repeat mosaics, the size of the final image, etc. Give it a try, is really easy to use.

Conclusion

I’m really impressed of how easy it is to use Elixir. I highly recommend that you make something from scratch. It worked really well for me.

The Elixir community is still growing and this is the time to get onboard.

Resources

Focus Mode

Contact Request

Close

We will call you right away. All information is kept private