Common problems with appCache

A client of mine need a large part of their existing website to work offline; the web site is really an app that is used to enter a fair amount of information though various forms. The site is designed to be used on mobile devices and in the outdoors, where the network connectivity is no the best, so the site already used localStorage to store user input when the server isn’t accessible. I tried to get it to work with appCache, but got more problems than I bargained for. In the end I decided that it just wasn’t work the trouble; it would impossible to keep all the offline data up-to-date without a huge amount of network traffic.

Here are some of the problems I encountered.

A url is either always cached, or never cached

This is absolutely the biggest problem when using appCache. You can tell the browser to cache a url, but after that the browser will always uses the cached version of the file, even though you are on-line! For static sites this isn’t a problem, but with dynamic sites that have pages with changing content, this is just horrible.

Using Cache-control headers? Doesn’t help. The browser caches the page no matter what.

So what’s the solution? The only solution I can think of is to serve the same pages from two different urls; one url is used for normal access, and the other url is cached by the browser so it can be accessed offline. So I access this page:

http://www.site.domain/customer/1

And I tell the browser to cache this page:

http://offline.site.domain/customer/1

That way the user always gets a fresh copy of the page when on-line. But this brings us to the next problem: how do we make sure the browser has the latest version of each page in its cache?

The cache is only updated when the cache.manifest changes

The browser will never automatically check if a page in its cache should be updated, unless the contents of the cache.manifest has changed. Using Cache-Control or Expires headers doesn’t help. All you can do is modify the cache.manifest (for example by updating a timestamp in it) whenever you know/think a page might have been updated.

Also, you cannot tell the browser that a single file has changed; the browser will always reload all files listed in the cache.manifest. That’s a bit of a problem if you want to have a large number of pages to be available offline.

Cross-domain fallbacks are tricky

Since none of the pages on my regular site are cached, I need to tell the browser to redirect to the offline site whenever the regular site is inaccessible. This is called a fallback.

That sounds easy, but it isn’t. I tried to redirect to my offline site if a page on the regular site cannot be accessed (and is not cached) by putting this cache manifest on my regular site:

CACHE MANIFEST
FALLBACK:
/ http://offline.site.domain/

That should tell the browser to cache the http://offline.site.domain/ url, and display it whenever accessing any page the regular site is not possible. Unfortunately, that just doesn’t work. Chrome refuses to redirect, instead it just shows the standard “Unable to connect to the Internet” message.

OK, next I tried to outsmart Chrome by placing a /offline url on my regular site, and instructed the server to do a 301 redirect to the offline site when someone tries to access the /offline url.

 
CACHE MANIFEST
CACHE:
http://offline.site.domain/
FALLBACK:
/ /offline

But nope, that doesn’t work either. You are not allowed to do cross domain redirects in the fallback pages. It actually says so in the spec. What does work, is doing the redirect in javascript. I simply put this content in the /offline file:

$(document).ready(function (){
    window.location = ‘http://offline.site.domain/;
});

No changes are needed to the cache manifest; the browser automatically caches the urls mentioned in the FALLBACK section.

Yay, I thought, I’m slowly getting somewhere with this.

Each site has its own appCache

Yes, each site has its own appCache, and it can contain files from other sites. Read that again. It means the browser doesn’t have one big bucket where it places all files from all cache.manifests. Instead, each site has its own bucket, where all files from that site’s cache.manifest are placed. Files from different domains can be placed in the same appCache; for example if your site requires jQuery, you can put https://code.jquery.com/jquery-2.1.3.min.js in the cache for your site. But that url is only cached for your site. If some other site also requires jQuery, it must place that url in its own cache.manifest. Sounds complicated? Below is an example.

I tried putting a cache.manifest with the following content on http://www.site.domain:

CACHE MANIFEST
CACHE:
http://offline.site.domain/
FALLBACK:
/ /offline

You would think that it caches the front page of the offline site (http://offline.site.domain/), and you could then access that page when offline, right? Not quite. It does cache that page, but places it in the cache for http://www.site.domain. That means the browser can only access the offline site when it’s referenced from the main site (i.e. when a page on the main site loads an image, script, css etc. from the offline site).

When you go to http://offline.site.domain/ the browser checks whether it has an appCache for that site, and since it doesn’t, it just says “Unable to connect to the Internet”, even though that exact url is in the appCache (for a different site).

In Chrome, you can write chrome://appcache-internals/ in the address bar to see what appCaches you have, and what files each of them contains.

So called ‘master’ files are automatically cached

Any page that contains a <html manifest=”/cache.manifest”> tag is called a master file, and is automatically added to the appCache. Apparently you cannot prevent this. Why is this a problem? Well, when the master file goes into the appCache, it will never again be automatically updated from the server (until you modify the cache.manifest).

A common solution is to put the manifest reference in a hidden iframe; that way the iframe is the master and will be cached, but the page containing the iframe will not be.

Conclusion

I came to the conclusion that the users would have to download several megabytes of data to their appCache to be also to use the site offline; the site just wasn’t designed with appCache in mind from the start. I scrapped the whole appCache idea for this site. I hope you have better luck with it!

2 comments

  1. I have a question regarding inserting the manifest file in the iframe.
    In firefox I see that the cache is populated, but files are taken from internet or bfcache but not from appcache. Did you tried to workaround that?

  2. The problem with your approach is that it appears your site is using old fashioned server side generated pages. Appcache is really not very useful for these types of sites.

    Appcache shines with single page applications, where you only have one html file.
    The whole site is packed into one or two js-files (which are cached), as well as one css file. Static images, language files etc are also cached.

    If you use Grunt or Gulp as a building tool (and you should!), there are tasks that automatically generate the manifest file for you.

    But again: appcache is for Single Page Applications, not for server side generated websites.

Leave a Reply