Home > Javascript, performance > A Better Javascript Memoizer

A Better Javascript Memoizer

May 1st, 2009

Last month, I had the pleasure of meeting tons of excellent and intelligent front-end engineers at JSConf 2009 — the conference for Javascript in Washington DC. If you are a front-end engineer — or even if you write programs in Javascript for back ends, mobile devices, or desktops — you absolutely have to go to JSConf 2010. It’s apparent that Javascript is quickly becoming one of the hottest languages for all environments and applications, and JSConf is the first (and only) conference that deals with Javascript exclusively.

JSConf 2009

Track A presentations from JSConf 2009

One of the Track A speakers was Stoyan Stefanov, a really smart guy from Yahoo. His presentation was about high-performance web apps, but on one slide in particular, I was struck by his implementation of memoization (slide 79 of 87) because he used the function instance to cache the results of the function itself.

If you’re not familiar with memoization, here’s a great overview:

In computing, memoization is an optimization technique used primarily to speed up computer programs by having function calls avoid repeating the calculation of results for previously-processed inputs.

Memoization (Wikipeida)

In other words, it’s a way to cache the results of function calls so that repeated expensive computations or lengthy lookups can be avoided.

Here’s Stoyan’s example:

function myFunc(param){
    if (!myFunc.cache) {
        myFunc.cache = {};
    }
    if (!myFunc.cache[param]) {
        var result = {}; // ...
        myFunc.cache[param] = result;
    }
    return myFunc.cache[param];
}

(Use your imagination to fill in some complex or lengthy task in place of the line with the “…”!)

Again, what struck me was his use of the function instance to cache the results. In Javascript, functions are first-class objects and can be assigned properties just like any other object. But should we do this? It certainly seems like this could cause intractable problems if used widely. Imagine if everybody started decorating functions and methods with properties? (Actually, debuggers such as Drosera and Firebug now use properties on function instances to help identify them in logs and traces. It seems to me we should try not to invade on that space!)

So, of course, I set out to answer the question, “Do we really need to use the function instance to hold the result cache?”

The answer is “No”. It’s really quite easy to avoid, too.

Keep reading if you’re interested in the steps I took to come to a general purpose solution. Otherwise, click here to go straight to it.

A simple example

Here’s the simplest form: a constant-valued function with one parameter.

// constant-valued function. not very interesting
var memoizedFunc0 = (function () {
        var result;
        return function () {
            // in a real-world example this next line would do some
            // complex math instead of just returning a date object
            return result || (result = new Date());
        }
    })();

Notice that we use an auto-execute, anonymous function (the outer function) solely to create a closure around the result variable. Also notice that the variable, memoizedFunc0, is assigned a reference to the inner anonymous function because the outer function executes immediately and returns it. The inner function is the one that gets executed when memoizedFunc0 is called later in our program.

The memoization happens in the inner function’s return statement. We’re using the short-circuit Boolean OR operator (||) to return result if it is assigned. Of course, during the first execution, the second half of the statement runs ((result = new Date())), assigning a value to result.

Admittedly, this is a very boring example. Stoyan’s was much more interesting since the memoized function took a parameter and returned a different result for each possible parameter. So let’s take a look at a multi-valued memoized function:

// multi-valued function with string-serializable param
var memoizedFunc1 = (function () {
    var cache = {};
        return function (param1) {
            return param1 in cache ? cache[param1] : (cache[param1] = new Date(param1));
        }
    })();

This is similar, but we use a native Javascript object to cache function results. As long as the parameter, param1, is primitive* (or is serializable to a unique string), this will work great. Property names must be serialized to strings. Objects that do not have a toString() method to generate a unique string will generally not work. In most situations, though, you’ll have an id or key that you can use, anyway.

The inner function simply checks if it already has a value cached and returns it. Otherwise, it generates a value, caches it, and returns it.

Success! A memoized function that does not pollute the function instance (and doesn’t use global variables like some other implementations I’ve seen).

A more interesting case: two parameters

But let’s not stop there. What if we have two parameters? I guess we could generate some sort of string key out of the parameters and use the single-parameter memoized function. But why? Generating a unique hash code is expensive and bulky. Let’s try to build a 2-parameter memoized function without that.

// multi-valued function with two params
// this time we use two levels of closures and two cache objects
var memoizedFunc2 = (function () {
    var cache1 = {};
        return function (param1, param2) {
            if (!(param1 in cache1)) {
                cache1[param1] = (function () {
                    var cache2 = {};
                    return function (param2) {
                        return param2 in cache2 ? cache2[param2] : (cache2[param2] = new Date(param1, param2));
                    }
                })();
        }
            return cache1[param1](param2);
        }
    })();

Heh. The solution is to build two levels of closures! The first level creates and caches the second level as needed! The first-level cache ends up holding a reference to an inner anonymous function for each unique param2 (and each inner function has it’s own unique closure).

A general purpose “memoizer”

The two-parameter case is quite ugly and unwieldy, in my opinion. However, it shed some light on a general-purpose solution, a “memoizer” I’d like to call it. With a little bit of work, I was able to devise a memoizer for any function with explicitly-declared parameters. Here it is:

// memoize: a general-purpose function to enable a function to use memoization
//   func: the function to be memoized
//   context: the context for the memoized function to execute within
//   Note: the function must use explicit, primitive parameters (or objects
//     that generate unique strings in a toString() method)
function memoize (func, context) {
    function memoizeArg (argPos) {
        var cache = {};
        return function () {
            if (argPos == 0) {
                if (!(arguments[argPos] in cache)) {
                    cache[arguments[argPos]] = func.apply(context, arguments);
                }
                return cache[arguments[argPos]];
            }
            else {
                if (!(arguments[argPos] in cache)) {
                    cache[arguments[argPos]] = memoizeArg(argPos - 1);
                }
                return cache[arguments[argPos]].apply(this, arguments);
            }
        }
    }
    // JScript doesn't grok the arity property, but uses length instead
    var arity = func.arity || func.length;
    return memoizeArg(arity - 1);
}

This still might look a little crazy, but it’s really quite simple. To satisfy an infinite number of parameters, we simply use recursion. Every time we recurse, we create another closure. This just happens in Javascript. It’s a natural side-effect of recursion in the language. All we have to do is use the closures to cache our results.

Our memoize function takes two arguments. The first is a reference to the function we want to memoize. The second is the context the memoized function should run in. For instance, if we want to memoize an object method, we would pass the method in as the first parameter and the object itself as the second parameter.

When the memoize function executes, it first discerns how many formal (explicit) parameters the function has. Javascript uses the arity property for this purpose. Of course, no interesting Javascript project would be complete without a Microsoft caveat of some sort. For some idiotic reason, JScript uses the length property instead of arity. Other Javascript engines followed suit and deprecated the arity property in favor of the length property.

Next, memoize calls its inner function memoizer on the last parameter. memoizer constructs a local cache and returns an inner function that either executes and caches our original function if we’re at the first parameter — or else it caches and executes a call to memoizer with the previous parameter (argPos - 1). Each time the function, memoizer, is recursively called it creates a closure around its own cache variable.

Notice that we’re using the apply method when we finally call our original function. This allows us to bind to our original context. We’re also using apply when we recursively call memoizer. This allows us to pass the arguments through without knowing how many arguments there are. The this passed into these calls is arbitrary and could easily have been anything else, including null.

I don’t know why I chose to start at the last parameter and go backwards. This routine could go in either direction. I also have no idea whether this memoizer is any faster or leaner than Stoyan’s. I imagine it doesn’t matter when you consider that both should be much, much faster than the lengthy XHR call or computationally intensive calculation they are meant to prevent.

OK. Cool. But … um … why?

Yes, memoization is a neat concept. But why use it rather than just hand-coded caching mechanisms? It’s easy enough to write a caching routine, right? Here are a few good reasons:

  • hand-coded caching mechanisms obfuscate your code
  • multi-variate caching routines are bulky in Javascript
  • fewer lines of code means fewer bugs
  • Java programmers will think more highly of you*

*Javascript kicks a$$. Show those stodgy back-end guys what we’re capable of!

Let’s try it out

So, let’s test it. First, let’s create a sample object to work with. This object has an “expensive” method that uses an XHR request to fetch server-side resources.

// myObj constructor
function myObj () { };
 
myObj.prototype = {
 
    expensiveAjaxLookup: function (y, mo, d, h, mn, s) {
        // OK.  No XHR.  Just pretend it does something on the server!
        // Instead we're just going to construct a date string.  Lame, I know.
        return this.prop + new Date(y || 0, mo || 0, d || 0, h || 0, mn || 0, s || 0);
    },
 
    // a public property to prove that "this" is really "this" in our memoized methods
    prop: 'my date: '
 
}

In order to prove that our memoized methods didn’t mangle the context of the original method, I added a public property and used it in the expensiveAjaxLookup method. If our memoized methods lost their context, then this property would be undefined.

Tests:

var o = new myObj();
o.memoizedLookup = memoize(o.expensiveAjaxLookup, o);
// Note: 8 == September, not August in Javascript
console.log(o.memoizedLookup(2009, 8, 17));
console.log(o.memoizedLookup(2009, 8, 16));
console.log(o.memoizedLookup(2009, 9, 16));
console.log(o.memoizedLookup(2009, 8, 16));
console.log(o.memoizedLookup(2009, 8, 21, 13, 26, 17));
console.log(o.memoizedLookup(2009, 8, 21, 13, 26, 17));

Output:

my date: Thu Sep 17 2009 00:00:00 GMT-0400 (EDT)
my date: Wed Sep 16 2009 00:00:00 GMT-0400 (EDT)
my date: Fri Oct 16 2009 00:00:00 GMT-0400 (EDT)
***** cache hit!
my date: Wed Sep 16 2009 00:00:00 GMT-0400 (EDT)
my date: Mon Sep 21 2009 13:26:17 GMT-0400 (EDT)
***** cache hit!
my date: Mon Sep 21 2009 13:26:17 GMT-0400 (EDT)

As you can see, the context was correctly bound. Otherwise, we’d see ” undefinedThu Sep 17 2009 00:00:00 GMT-0400 (EDT)“. The lines that say “***** cache hit!” are some debugging statements I threw in. I sprinkled some console.log() calls all over the memoize function to ensure that the caches were being used as expected. They are, of course. I’ll leave it as an exercise for you to study the behavior of the function, if you are so inclined. I found the output very interesting and revealing. Go ahead. Have fun with it!

So, what uses can you conceive for memoization in web apps? Is it only good for caching XHR requests or summing Fibonacci series? Please leave a comment or let me know on Twitter!


[Update] I just came across two interesting Javascript memoization articles. In one, the author included an “unmemoize” function. In the other, the “unmemoification” happened automatically at a specified time delay. In the first implementation, though, the function instances are used to hold the unmemoize method. In the second, a hash key is generated to handle multiple parameters.

It’s certainly possible to add both features without either of those sacrifices. I have some ideas how it could be done. Do you have an implementation that works? If so, please leave a comment!

Memoization in JavaScript

Timed Memoization

In this article, memoization is used to speed up Bezier curve calculations: One-Line JavaScript Memoization


[Update 2]I updated the post to remove references to the word “serializable” since I really meant primitives. In Javascript, all objects are serializable. However, I really meant uniquely serializable. All primitives in Javascript are uniquely serializable (String(37) will always be “37”). Objects, Functions, and Arrays are not always unique when serialized. In most engines, objects, for instance, serialize as “[Object object]“. Dates, despite not being primitive, do have unique serializations (but are locale-dependent).

I also updated the text to clarify that the arity property has been deprecated in favor of the length property.

Javascript, performance

  1. EllisGL
    May 5th, 2009 at 07:25 | #1

    I can see how this can be a huge double edge sword. On the good side, you can cache heavy query results. On the bad side, you could cache many heavy query results that have large data sets.

    Of course with the timed “unmemoized” that would help dull the other side of the sword. What about doing a stack or doing a “importance” and removing the less important stuff when a new one is cached?

    • May 5th, 2009 at 19:04 | #2

      Thanks for your input, EllisGL! Excellent point, too. The usual trade-off: speed for memory.

      So, here’s a question: should the unmemoize feature forget the cached results at a strict future time (exactly at the designated expiration) or should it only forget at the next convenient time (some time after expiration)?

      More specifically:

      setTimeout(unmemoize, expiryPeriod); // strict time

      or

      if(new Date() > expiryDate) unmemoize(); // this is in the memoize method

      Using a setTimeout will enforce a strict expiry time (assuming that the UI thread is not too busy). Otherwise, we can just check for expired cache entries when we’re adding new ones. This would allow us to prioritize as you mentioned. However, it seems like a lot more code / effort / memory.

      Thoughts?

  2. May 5th, 2009 at 08:25 | #3

    let me be the first to say “f— yeah”. looks rad.

    i rely on memoization a lot for caching.

    The only think missing from my js memoization code is cache invalidation. Could you add unmemoize and cache invalidation timeouts into your implementation?

    Would like to see that.

    • May 5th, 2009 at 19:10 | #4

      Sure, Vijay. Timeouts are easy. Explicit cache invalidation is harder if you’re trying not to use members on the function instance. I think I’ll set out to solve all three cases:
      1) setTimeout-based invalidation
      2) simple, unmemoize method member on function instance
      3) unmemoize without touching the function instance

      Thanks for the feedback!

  3. Dave
    May 5th, 2009 at 08:31 | #5

    If all the args have to be serializable to string, why not do them all at once? As long as you pick a delimiter unlikely to be in the arguments there’s no ambiguity. Here’s what I mean, untested:

    function memoize (func, context, maxage) {
      var cache = {}, expiration = {};
      maxage = maxage || 24*60*60;	// default 24 hours
      return function () {
        var signature = Array.join.apply(arguments, "x01");
        var created = new Date(); age.setSeconds(age.getSeconds()-maxage);
        if ( !(signature in cache) || expiration[signature] < (new Date) ) {
          cache[signature] = func.apply(context, arguments);
          var expires = new Date(); expires.setSeconds(age.getSeconds()-maxage);
          expiration[signature] = expires;
        }
        return cache[signature];
      }
    }
    • May 5th, 2009 at 19:23 | #6

      Hey Dave, thanks for the code example!

      I originally decided to stay away from joining the serialized params because of the difficulty in choosing the join separator string. Yours looks pretty darn safe, imho, but there is still a minute chance that there could be ambiguities (false positives) in the cache caused when the parameters contain the join separator string.

      My goal was to see if I could find a method that worked 100% of the time. I didn’t quite achieve my goal*, but I think that I got 0.00001% further because I eliminated the join ambiguities.

      * Objects serialize to “[Object object]“. Doh! See kangax’s note on this memoization implementation: http://blog.thejit.org/2008/09/05/memoization-in-javascript/

  4. Dave
    May 6th, 2009 at 07:56 | #7

    Like that page says, probably the best thing to do for Object is to document that it doesn’t work. You could write a version that serialized Object to string (recursively!) or heck, turn it into JSON and use that string.

    I just realized my untested code above has some mistakes…not surprising. Delete the line starting “var created” and change age.getSeconds()-maxage to expires.getSeconds()+maxage.

  5. Naeem
    May 7th, 2009 at 15:19 | #8

    Good work John.

  6. Nick
    May 10th, 2009 at 21:38 | #9

    SproutCore 1.0 now has built in memoization. Basically, this is all you need to do:

    function() {
    // some heavy function call/logic
    }.property(‘myName’).cacheable()

    The .property() call ensures that if ‘myName’ changes, the cache invalidates. Also, there is an external call for expiring cache if needed…

    Pretty neat, huh?

    • May 11th, 2009 at 15:38 | #10

      @Nick: I really like the way it uses the “property” to invalidate / reset the cache! That is neat. That gives me some ideas…. hm….

  7. May 14th, 2009 at 08:58 | #11

    The most elegant example I have ever seen is:

    function memoize(f) {
      return function () {
          var args = Array.prototype.slice.call(arguments);
          f.memoized = f.memoized || {};
          return (args in f.memoized) ?
            f.memoized[args] :
            f.memoized[args] = f.apply(this, args);
      };
    }
    

    Courtesy of http://blog.thejit.org/

    • May 14th, 2009 at 22:48 | #12

      Hey Simon,

      On the surface, Nicolas’ implementation looks good, but it’s not robust. Here’s why:

      1. It adds properties to the function instance, f.
      2. args.toString() is not guaranteed to be unique for all combinations of arguments.

      Regarding the first point, any general-purpose function should avoid interfering with other code as much as possible. By assigning properties to the function instance, you have limited the calling code from using a similarly-named property on the function instance. This may seem like a minor point, but if all general-purpose functions worked this way, we’d eventually encounter conflicts — and this would result in some really nasty debugging sessions.

      The second point is much more important. All Javascript property names must be strings, so the array is serialized to a string when it is used as a property name (f.memoized[args]). In Javascript, Arrays serialze to a CSV-like representation. For example, [1, 2, 3, 'foo'] will serialize to “1,2,3,foo”. This becomes a problem when objects are introduced — or even strings containing commas. Here’s what you’d get when you pass the following arguments:

      [1, 'commas,are,in,here', {foo: 'bar'}] ⇒ “1,commas,are,in,here,[object Object]”

      Unfortunately, the following arguments yield the same exact property name when serialized:

      [1, 'commas', 'are', 'in', 'here', {prop: 'val'}] ⇒ “1,commas,are,in,here,[object Object]“

  8. May 15th, 2009 at 12:18 | #13

    You can make this work for functions with arity 0 by changing the last two lines to:

        var arity = func.arity || func.length;
        return memoizeArg(arity ? arity - 1 : 0);
    

    -m

  1. May 5th, 2009 at 05:14 | #1
  2. May 5th, 2009 at 07:38 | #2
  3. May 5th, 2009 at 07:40 | #3
  4. May 5th, 2009 at 15:49 | #4
  5. May 6th, 2009 at 02:09 | #5
  6. May 7th, 2009 at 21:21 | #6
  7. May 10th, 2009 at 09:32 | #7
  8. May 12th, 2009 at 05:18 | #8
  9. May 12th, 2009 at 18:25 | #9
  10. May 13th, 2009 at 20:53 | #10
  11. January 31st, 2010 at 08:45 | #11
  12. April 28th, 2010 at 15:41 | #12
  13. September 7th, 2011 at 15:38 | #13
  14. September 19th, 2011 at 02:15 | #14
Comments are closed.