In my previous post, A Better Javascript Memoizer, some of you left some great feedback. (Thanks to all of you!) I think it’s because each of us has a different definition of “better”. That makes sense.
Actually, I just wrote that title quickly when I got inspired to start writing. Just before publication, I was going to change it to remove the word “better”. But as an experiment, I decided to keep it to see if it would elicit more feedback. Success! 🙂
Two people wanted to investigate cache invalidation, a.k.a. “unmemoization”, some more. Scratch that. Make that three people: I wanted to try it, too.
Here’s my attempt at better unmemoization in Javascript. I’m looking forward to more great feedback! (I really, really need it this time!)
First, let’s simplify the memoize function a bit. In my haste to publish, I missed a good opportunity to increase the performance by using a local variable. I also found a way to shorten it and increase readability by reducing the simplistic if
statements into single-line statements using the conditional operator (?:). Here’s the updated version:
function memoize (func, context) { function memoizeArg (argPos) { var cache = {}; return function () { var arg = arguments[argPos]; // new local variable if (argPos == 0) { return arg in cache ? cache[arg] : cache[arg] = func.apply(context, arguments); } else { return arg in cache ? cache[arg].apply(this, arguments) : cache[arg] = memoizeArg(argPos - 1); } } } return memoizeArg(func.length - 1); } |
This looks better to my eyes — although the cache assignments are somewhat hidden in the last operand of the conditional operator. (Doh! Bad programmer, bad!) Oh, and look! I succumbed to the func.length precedence over func.arity as noted by igstan on Ajaxian.
Scheduled whole-cache reset
If you looked at any of the unmemoize features in the links from the previous post or at Dave’s code snippet in the comments, you’ll see that there are [at least] two ways to reset the cache: at an explicit time (scheduled), and just-in-time (JIT). Scheduled reset uses the setTimeout
function and JIT uses date math.
Here’s an example of scheduled reset. It resets the entire cache after a given delay after its first use.
// whole-cache reset via setTimeout function memoize2 (func, context, /* msec */ cacheLifetime) { function memoizeArg (argPos) { var cache = {}; // reset the cache after the specified timeout: setTimeout(function () { cache = {}; }, cacheLifetime); return function () { var arg = arguments[argPos]; if (argPos == 0) { return arg in cache ? cache[arg] : cache[arg] = func.apply(context, arguments); } else { return arg in cache ? cache[arg].apply(this, arguments) : cache[arg] = memoizeArg(argPos - 1); } } } return memoizeArg(func.length - 1); } |
Wow. This is really ill-conceived. The first cache to be created (let’s call it the root cache) is the first one destroyed. Since all other caches (for the other parameters) are reachable hierarchically through the root cache, they’re unreachable as soon as the root cache is destroyed. Therefore, we’ll potentially have several unreachable caches wasting memory until their setTimeouts execute (which could be as much as cacheTimeout msec later).
If it’s not immediately apparent to you why this is true (and it wasn’t for me), then note that there are now two function closures with a reference to the local cache variable: 1) the returned function, and 2) the anonymous function in the setTimeout (a.k.a. “setTimeout function”). When a higher-level cache is reset/released via its “setTimeout function”, the references to the lower-level caches (via the returned functions from the calls to memoizeArg()) are released. However, the references to the lower-level caches from their corresponding “setTimeout functions” are not released until those setTimeouts execute!
Strike one!
(If you’re not experienced in functional programming, this is probably very confusing. Either that, or my explanations need a lot more work.)
Scheduled reset of individual cache entries
OK. What about if we invalidate individual cache items instead of whole caches? Would that fix the situation? Lets see:
// cache entry reset via setTimeout (each entry times out individually) function memoize3 (func, context, /* msec */ cacheLifetime) { function memoizeArg (argPos) { var cache = {}; return function () { var arg = arguments[argPos]; // schedule this cache entry for deletion if it's our first time if (!(arg in cache)) setTimeout(function () { delete cache[arg]; }, cacheLifetime); if (argPos == 0) { return arg in cache ? cache[arg] : cache[arg] = func.apply(context, arguments); } else { return arg in cache ? cache[arg].apply(this, arguments) : cache[arg] = memoizeArg(argPos - 1); } } } return memoizeArg(func.length - 1); } |
There! Fixed! Um…right?
Not really. We’ve got the same exact problem: all of the anonymous functions in the setTimeouts (again, “setTimeout functions”) are holding references to our caches! Even though our cache entries will get cleaned up one entry at a time sooner, they’ll still be wasting memory for a while.
Furthermore, there will be hundreds or thousands of active setTimeouts when there are a hunreds or thousands of values to cache. I haven’t done any formal testing, but I’ll wager that this creates a bad situation in some browsers / environments. (Yes, I’m looking at you, IE 6!)
Strike Two!
Reset the cranium
This is really bothering me. It’s the recursion and closures that are causing the problem! And those are the very things I set out to utilize to make a “better” memoizer!
It’s time to reset my cranium rather than strike out. Lots of ideas are coming in to my head, but none of them feel right. None of them deserve the word, “better”. Not to me, anyways.
If you have some suggestions, please let me know!
The problem is that you have seperate caches for each closure. Why not create a single cache array (really an array of caches)? Caches for each argument could then be accessed by argument index, but could all be cleared by emptying a single array. I used setInterval rather than setTimeout, assuming that usually we will want fresh data on a regular interval and not just a single refresh, and I added a check to ensure that a lifetime was passed before setting the interval.
// whole-cache reset via setInterval
function memoize (func, context, /* msec */ cacheLifetime) {
var cache = [];
function memoizeArg (argPos) {
if(argPos == 0 && cacheLifetime)
setInterval(function () { cache.length = 0; }, cacheLifetime);
var local = cache[argPos] || (cache[argPos] = {});
return function () {
var arg = arguments[argPos];
if (argPos == 0) {
return arg in local ? local[arg] : local[arg] = func.apply(context, arguments);
}
else {
return arg in local ? local[arg].apply(this, arguments) : local[arg] = memoizeArg(argPos – 1);
}
}
}
return memoizeArg(func.length – 1);
}
Did you have any thoughts of adding a manual method for resetting the cache? I know this modifies the function in a way you were trying to avoid, but it is an absolute necessity in some cases:
// whole-cache reset via setInterval or manual
function memoize (func, context, /* msec */ cacheLifetime) {
var cache = [];
var reset = func.resetCache = function () { cache.length = 0; };
function memoizeArg (argPos) {
if(argPos == 0 && cacheLifetime)
setInterval(reset, cacheLifetime);
var local = cache[argPos] || (cache[argPos] = {});
return function () {
var arg = arguments[argPos];
if (argPos == 0) {
return arg in local ? local[arg] : local[arg] = func.apply(context, arguments);
}
else {
return arg in local ? local[arg].apply(this, arguments) : local[arg] = memoizeArg(argPos – 1);
}
}
}
return memoizeArg(func.length – 1);
}
var fn = memoize(function(v) { return v; });
fn.resetCache();
I should have tested what I posted before because it really doesn’t even come close to working… But, I also think that your current versions of memoize, memoize2 and memoize3 aren’t working. You need to not only cache the result of memoizeArg when argPos > 0, but also call it immediately.
Clearing the array in my previously posted code has no affect on the individual caches because of the closures that are created when memoizeArg is called. The only solution I can come up with is to create a function closures that can clear the caches for us. These functions get pushed into an array so that we can call them later… I’m not sure if this qualifies as “better”, but this code has been tested and does work.