Changes during perfctr-2.5:
- Add support for the x86_64 kernel. Skip ia32 emulation.
- Add /dev/perfctr ATTACH ioctl for vperfctrs, and allow access
  both that way and via /proc/pid/perfctr.
- Investigate the feasibility of adding back inheritance support.
- Make sure sched_setaffinity() and the kernel's own set_cpus_allowed()
  calls don't break perfctr on HT P4s.

Changes for perfctr-2.6:
- Kill /proc/<pid>/perfctr. It enlarges the interface between the
  kernel and the driver, and suffers from kernel version dependencies.
  It also makes a patch-less version of the driver impossible.
  Open vperfctrs via /dev/perfctr and create a private vperfctr_vfs for
  the vperfctr file descriptors. The code for this already exists in
  perfctr-3.1 (RIP) and perfctr-1.6.
- After /proc/<pid>/perfctr is gone, tidy up the vperfctr_vfs code by
  putting 2.2 code in separate files in a 2.2 subdirectory.
  (This is difficult to do right now with the ugly /proc/<pid>/code.)

Changes after perfctr-2.6:
- Implement a patch-less version of the driver. Insert a glue module
  that hooks into the kernel via code backpatching and symbol table
  information. Afterwards, the driver module proper can interface with
  the glue module for the kernel callbacks, IDT, and irq return path.
  This requires the /proc/<pid>/perfctr removal in perfctr-2.6.

Driver:
- drop #ifdef SMP around virtual's sample operation
- When an overflown perfctr is reset, we should take into account
  how many events past 0 or 1 it is at.

Library:
- Change usr.lib/Makefile to build .so from -fPIC objects; needed for x86_64.
- Add vperfctr_mmap() to libperfctr.c: the goal is to perform all
  accesses via the library, even for examples/perfex/.
- Implement gethrvtime(). Don't ever STOP the counters. To stop PMC
  updates, call CONTROL with tsc_on == 1 and nractrs == nrictrs == 0.
  The driver will continue sampling the TSC. Then gethrvtime() reduces
  to scaling the virtualised TSC with cpu_khz.
- Describe derived events in event_set.c.

Documentation:
- Write it :-(

Possible Changes:
- The P6 and P4 sub-models don't matter for the driver. Should the driver
  just export the major model and the cpuid, and let user-space figure
  out sub-model details?
- Access control mechanism for global-mode perfctrs?
- Interrupt support for global-mode perfctrs?
- Multiplexing support? PAPI seems to do fine w/o it.
- A "kernel profiling" mode which uses global-mode perfctrs in
  interrupt mode to profile the kernel?
- Buffer interrupts and signal user-space when buffer is nearly full?
