Web Archiving + Emulation
Ilya Kreymer & Dragan Espenschied
14 Web Arrchives
Part of "the Collaborative Archive of the Web"
- Uses ODU Memento Aggregator by Sawood Alam
- Aggregates from multiple TimeGate API calls
- Finds closest match from available archives
- Any unaltered web archive can be included
- Requested date and actual date
- Count and date range of embeds
- List the archives used
- Can surface additional metadata
Common Web Archive Replay Problems
and possible Solutions!
- Web Archive 'leaks' to live web
Live leaks not possible
- Improperly rewritten dynamic urls
No Rewriting needed!
- Pages "don't appear the same" (no <blink>, etc..)
Use era-appropriate browsers
- Flash, Java, Shockwave, other legacy content
Architecture and Concepts
A layered Architecture
- Web archives provide Web pages
- Web pages running in Browsers
- Browsers running in Emulators
- Emulators running in Containers
- Containers running in a VM
- Screen of container streamed to client
Differences from other
- Designed specifically for web archives
- The browser is the platform for web pages
- The OS is secondary to the browser
- Timing latency less critical for the web
Web (HTTP) emulation
- Not general Internet emulation
- HTTP hasn't changed much since 1.0 release
- HTTP can route all traffic through a Proxy Server
- A Proxy Server can be used to serve archived content
- Difficult to configure manually just to view an archive...
- ...but easy if pre-configured in a "container"
- More lightweight than VMs, use less resources
- 'Contain' programs, data and file systems that can be deployed rapidly
- oldweb.today supports roughly 5-6 browser containers per CPU (Amazon EC2)
- Docker is a popular, open-source container system
The web as performative media
oldweb.today makes the variability of the web tangible.
The web has, since the advent of client-side scripting in 1995, been a software delivery system rather than a document delivery system.
Absence of technically technically complex artifacts in classic web archives
As web archiving techniques improve (for instance with Webrecorder), the preservation of software environments that the web can be performed within becomes essential.
Reduced speed perceived as authentic
Transparent medium becomes opaque again
- Audio Support
- Automated Browser Suggestions
- Additional and Refined Browser Environments
- Webrecorder is our main development focus
- Provides high fidelity Web Archiving for All
- User-driven, personalized archiving
- Hosted service and free, open-source software
- Sneak preview: Record and Replay Flash with containerized browsers
Join our Webrecorder Workshop on Friday, 9:00!
Track 2 — Hekla