Saving bandwidth with Zopfli

Zopfli comments

Today Jeff Atwood published the article Zopfli Optimization: Literally Free Bandwidth, praising the compression algorithm Zopfli. Zopfli was created by Google and published in 2013:

The Zopfli Compression Algorithm is a new open sourced general purpose data compression library that got its name from a Swiss bread recipe. It is an implementation of the Deflate compression algorithm that creates a smaller output size compared to previous techniques. [...]

The output generated by Zopfli is typically 3–8% smaller compared to zlib at maximum compression, and we believe that Zopfli represents the state of the art in Deflate-compatible compression. Zopfli is written in C for portability. It is a compression-only library; existing software can decompress the data. Zopfli is bit-stream compatible with compression used in gzip, Zip, PNG, HTTP requests, and others.

Jeff gave Zopfli a try and got impressive results:

In my testing, Zopfli reliably produces 3 to 8 percent smaller PNG images than even the mighty PNGout, which is an incredible feat.

However, Zopfli has one drawback. Its awesome compression ratio comes with a speed penalty, it's more than 80 times slower than gzip.

Because of its slowness Zopfli is not the best choice for compression at runtime. But where it really shines is when it's used for pre-compressed data. A very good candidate are PNG encoded images. There's even a Zopfli encoder for that purpose, ZopfliPNG.

So because of the proclaimed reduction of the size of PNGs, I gave ZopfliPNG a try. First I measured the current size of all PNG images of my site with this PowerShell command:

PS> gci *.png -Recurse | Measure-Object -Sum Length

Count    : 110
Average  :
Sum      : 4775284
Maximum  :
Minimum  :
Property : Length

That's about 4.6 MiB of PNGs. Than I let ZopfliPNG re-compress all these files:

gci *.png -Recurse | %{ .\zopflipng.exe -y --lossy_transparent $_.FullName $_.FullName } 

A few minutes and 110 files later the command has finished. 56 files have been changed, i.e. ZopfliPNG was able to produce a smaller size for more than half of all images.

The entire size of all PNGs is now this:

PS> gci *.png -Recurse | Measure-Object -Sum Length

Count    : 110
Average  :
Sum      : 3616048
Maximum  :
Minimum  :
Property : Length

So from the former 4.6 MiB it went down to 3.4 MiB, that's a reduction by 26 percent. Quite impressive for just changing the compression algorithm.

My Journey to Pretzel: First Stop Jekyll

Jekyll, CommunityServer comments

As described in my previous post, I decided to replace my CommunityServer setup with a static site generator. Being a vivid GitHub user, the first choice was GitHub Pages. GitHub Pages allows you to commit your site to a repository and let GitHub serve it using Jekyll.

I won't go into detail how to setup Jekyll or GitHub Pages. You'll be able to find enough information in the internet. Phil Haack, a much better story teller than me, wrote an article about his migration from SubText to Jekyll. He also gave advice how to preserve all comments with Disqus.

So my first step was to get the data of my existing site out of CommunityServer and into a format that Jekyll understands. Keyvan Nayyeri wrote once an extension for CommunityServer to download the content as a BlogML document (a standarized XML format for storing blog content).

Then I found BlogMLToMarkdown, a command line tool transforming a BlogML document to markdown documents ready to be consumed by Jekyll.

However, I had to tweak that tool for my needs. Amongst others it

  • downloads all images, attachments, and other files hosted on my old site and changes the referencing links,
  • fixes redirects, and
  • exports all (approved) comments to a disqus.wxr, ready to be imported into Disqus.

If you're interested you can see my fork at Github.

After I transformed my blog and tweaked the style (it's based on Hyde), I committed it to a GitHub repository, where it was happily hosted.

However, this setup had some drawbacks, which I will discuss in another post.

My Journey to Pretzel: Preface

Pretzel, .Text, CommunityServer comments

After running my site for more than 12 years, I decided it was time to replace the software behind it with something new.

I started this blog with .Text in August of 2002, which later was merged into CommunityServer. I wrote many extensions for CS, customized it in many ways, and was active in the community. Telligent, the company behind CommunityServer, even awarded me with a MVP status.

However, I lost interest in CS after a while. I started writing my own blog engine (as most developers do I guess). Well, not only once but countless times. Every few months I threw away the current code and started again from scratch. I took the opportunity to play around with the newest stuff, ASP.NET MVC and Nancy, Entity Framework or RavenDb, n-tier architecture or CQRS.

Well, though I learnt a lot about the different technologies, libraries etc, I never got to the point where my software was ready to be published.

Finally I decided to overcome my developer ego and use some existing software. A few month ago I switched to Jekyll, a static site generator written in Ruby. Some weeks later I switched again, this time to Pretzel, which is very similar to Jekyll but written in .NET.

After writing this introduction, I intend to publish a couple of posts about the different stations of my journey to Pretzel in the next few weeks. E.g. I wrote several extensions for Pretzel, and I configured Azure for auto-deployment whenever I push to the git repository of this site.

NuGet package for 7Zip4Powershell

7-zip, Powershell, NuGet comments

nuget

A few days ago I mentioned 7-Zip for Powershell. I’ve now created a NuGet package and published it at NuGet.org.

It took me a while to figure it out, but finally it’s a “tools only” package, i.e. it adds no references to your project.

To use the new commands just add the package to your solution and import it in your Powershell script:

$SolutionDir = split-path -parent $PSCommandPath
Import-Module (Join-Path $SolutionDir "packages\7Zip4Powershell.1.0\tools\7Zip4Powershell.dll")

Fun with RavenDB and ASP.NET MVC: part I

RavenDB, aspnetmvc comments

I’m working on a small pet project with ASP.NET MVC, where hierarchical structured documents are stored in RavenDB. These documents can be retrieved by their unique URL, which is also stored in the document. Because there are different kinds of document class, they all derive from the common interface IRoutable. This interface defines a property Path, by which the document can be accessed.

public interface IRoutable {
    string Id { get; set; }
    string Path { get; set; }
}

public class Document : IRoutable {
    public string Id { get; set; }
    public string Path { get; set; }
}

using (var session = _store.OpenSession()) {
    session.Store(new Document { Path = "a" });
    session.Store(new Document { Path = "a/b" });
    session.Store(new Document { Path = "a/b/c" });
    session.Store(new Document { Path = "a/d" });
    session.Store(new Document { Path = "a/d/e" });
    session.Store(new Document { Path = "a/f" });
    session.SaveChanges();
}

Additionally there’s the requirement, that the incoming URL may consist of more parts than the document’s path, carrying some additional information about the request. Here are some examples of possible requests, and which document should match:

Request Found document
a/x a
a/b/c/y/z a/b/c

So, given the path, how can you find the correct document?

The solution to this consists of three parts:

  1. Identify documents in the database which can be accessed via their path
  2. Index those documents
  3. Find the document which matches best a given URL

Marking routable documents

Because there’s not a single class for pages stored in the database, I mark all documents implementing IRoutable by adding a flag IsRoutable to the document’s metadata. This is done by implementing IDocumentStoreListener, so the code is called by RavenDB whenever a document is stored:

public class DocumentStoreListener : IDocumentStoreListener {
    public const string IS_ROUTABLE = "IsRoutable";

    public bool BeforeStore(string key, object entityInstance, RavenJObject metadata, RavenJObject original) {
        var document = entityInstance as IRoutable;
        if (document == null) {
            return false;
        }
        if (metadata.ContainsKey(IS_ROUTABLE) && metadata.Value<bool>(IS_ROUTABLE)) {
            return false;
        }
        metadata.Add(IS_ROUTABLE, true);
        return true;
    }

    public void AfterStore(string key, object entityInstance, RavenJObject metadata) {
    }
}

Indexing routable documents

The next step is to create an index for all documents with the proper flag in their metadata:

public class IRoutable_ByPath : AbstractIndexCreationTask {
    public override IndexDefinition CreateIndexDefinition() {
        return new IndexDefinition {
            Map = @"from doc in docs where doc[""@metadata""][""" + DocumentStoreListener.IS_ROUTABLE + @"""].ToString() == ""True"" select new { doc.Path }"
        };
    }
}

Searching for documents

Ok, so much for the preparation. The interesting part starts when a request comes in. Here RavenDB’s boosting feature is quite handy. The more parts of the path match, the higher score the document will get. E.g. if the requested path is a/b/c/d/e, following RavenDB search will be queried:

search term boost
a/b/c/d/e 5
a/b/c/d 4
a/b/c 3
a/b 2
a 1

The code to create such a query looks like this:

public IRoutable GetRoutable(string path) {
    var query = _documentSession
        .Query<IRoutable, IRoutable_ByPath>();

    if (!String.IsNullOrEmpty(path)) {
        var pathParts = path.Split('/');
        for (var i = 1; i <= pathParts.Length; ++i) {
            var shortenedPath = String.Join("/", pathParts, startIndex: 0, count: i);
            query = query.Search(doc => doc.Path, shortenedPath, boost: i, options: SearchOptions.Or);
        }
    } else {
        query = query.Where(doc => doc.Path == String.Empty);
    }

    var document = query.Take(1).FirstOrDefault();
    return document;
}

This method will finally return the document with the longest matching path.

Based on the incoming request we have found a matching document. What we can do with the remaining part of the URL I’ll leave for the next installment.

I published the complete code with unit tests in this Github repository.