$Id: CHANGES,v 1.93 2003/03/10 22:23:03 mikpe Exp $

			CHANGES
			=======

[High-level changes in reverse chronological order. Detailed
 driver changes are in linux/drivers/perfctr/RELEASE-NOTES.]

Version 2.5.0, 2003-03-10
- Added a simple user-space library API for accessing other
  processes' virtual performance counters. This uses a new
  type and a new set of operations since remote access has
  different requirements than accessing one's own counters.
  Following Mike Marty's suggestion, I left out the process
  control calls needed around these operations (ptrace() and
  wait()), so applications must handle that themselves.
- Added 'make install' support for the user-space components.
- Driver API cleanups. The 'eventsel_aux[]' array in 'struct
  perfctr_cpu_control' has been renamed as 'escr[]' and has been
  moved into the 'p4' sub-structure. (The change highlights the
  fact that this field was and is P4-only.) The 'version[]' string
  in 'struct perfctr_info' has been renamed to 'driver_version[]',
  since perfctr_info now also contains an 'abi_version' field.
  Some changes in the driver ABI: while not strictly necessary,
  they clean things up and make room for future changes. The ABI
  changed anyway from perfctr-2.4, so this shouldn't be a problem.
- Added a perfctr_cpu_control_print() procedure to the library,
  and updated the example programs to use it.
- Updated the perfex example program's help text to describe the
  syntax and meaning of event specifiers.
- Patch kit updates for 2.2.24/2.4.18-26(RedHat)/2.5.64 kernels.

Version 2.5.0-pre2, 2003-03-03
- Added a way for user-space to query the driver's ABI version,
  and updated the library to check it.
- Fixed <linux/perfctr.h> to not include <asm/perfctr.h> when
  perfctr hasn't been configured. This allows the patched kernel
  source to compile cleanly also in archs not supported by perfctr.
- Major patch kit overhaul. Updated configuration help texts.
  Removed unnecessary features and patches. Some cleanups. Added
  aliasing support to the 'update-kernel' script, which allows a
  patch to serve several kernels (when applicable).
- The perfctr configuration option was poorly placed. It is now
  at the end of the "Processor type and features" menu.
- Removed "notsc" kernel option support from the 2.2 kernel patches.
  To use the driver with an IDT WinChip (Centaur C6/2/3) CPU now
  requires a newer kernel with native "notsc" support.
- Driver fixes for changes in the 2.4.21-pre5 and 2.5.63 kernels.

Version 2.5.0-pre1, 2003-02-19
- Fixed the driver's API to support global-mode perfctrs on 2.5
  SMP kernels and asymmetric hyper-threaded P4 multiprocessors.
  Updated examples/global/global.c for the new API.
- Minor library cleanups. Updated example programs accordingly.
- API cleanup: Removed obsolete STOP command from the driver
  for virtual perfctrs. The library now uses CONTROL instead.
- Proper detection and support for AMD K8 processors. They are
  similar to the K7s, but the event sets are not identical.
- The library's event set descriptions have been redesigned and
  expanded to include unit mask descriptions and descriptions of
  Intel P4 and AMD K8 events. The etc/perfctr-events.tab text file
  has been removed since event_codes.h now is generated from the
  library's data structures.

Version 2.4.5, 2003-02-09
- Corrected the unit mask definition for the K7 SYSTEM_REQUEST_TYPE
  event in etc/perfctr-events.tab: WC is 0x02 not 0x04.
- Fixed two compile warnings which could be triggered in 2.5 kernels.
- Patch kit updates for 2.4.21-pre4/2.4.18-24(RedHat)/2.5.59-osdl2 kernels.

Version 2.4.4, 2003-01-18
- Fixed a context-switch bug where an interrupt-mode counter could
  increment unexpectedly, and also miss the overflow interrupt.
- Fixed some ugly log messages the new HT P4 support code added
  in perfctr-2.4.3 could generate at driver initialisation time.
- Added preliminary support for AMD K8 processors with the
  regular 32-bit x86 kernel. The K8 performance counters appear
  to be identical or very similar to the K7 performance counters.

Version 2.4.3, 2002-12-11
- Support for hyper-threaded Pentium 4s added. In a HT P4, the
  two logical processors share the performance counter state.
  HT P4s are therefore _asymmetric_ multi-processors, and the
  driver enforces CPU affinity masks on users of per-process
  performance counters to avoid resource conflicts. (Users are
  restricted to logical processor #0 in each physical CPU.)
  Limitations:
  * The kernel mechanism for updating a process' CPU affinity
    mask uses no or very weak locking, which makes certain race
    conditions possible that can break the driver's CPU affinity
    mask restrictions. For now, users should NOT use the
    sched_setaffinity() system call on processes using per-process
    performance counters.
  * Global-mode performance counters don't work on HT P4s due to
    limitations in the API. This will be fixed in perfctr-2.5.
  * 2.2 kernels don't have CPU affinity masks, and therefore can't
    support HT P4s.

Version 2.4.2, 2002-11-25
- Fixed a driver bug where it could fail to prevent simultaneous
  use of global-mode and per-process performance counters.
- Made the driver safe for preemptible 2.5 kernels.
- New patches for RedHat update kernels 2.2.22-6.2.2, 2.2.22-7.0.2,
  2.4.18-18.7.x, and 2.4.18-18.8.0.

Version 2.4.1, 2002-10-12
- Support RedHat 8.0's 2.4.18-14 kernel. Building perfctr as
  a module caused a namespace clash in this kernel. The fix
  required a change to the driver's kernel-resident glue code.

Version 2.4.0, 2002-09-26
- Fixed an overly strict access control check which prevented
  opening another process' /proc/<pid>/perfctr when the driver
  was built as a module.
- Updates for kernels 2.2.22, 2.4.18-10-redhat, 2.4.20-pre8, 2.5.36.

Version 2.4.0-pre2, 2002-08-27
- vperfctr_control() now allows the user to specify that some PMC
  sums are not to be cleared when updating the control.
  There is a new bitmap field `preserve' in struct vperfctr_control:
  if bit i is set then PMC(i)'s sum is not cleared.
  `preserve' is a simple `unsigned long' for now, since this type
  fits all currently known CPU types.
  This change breaks binary compatibility, but user-space code which
  clears the entire control record before filling in relevant fields
  will continue to work as before after a recompile.
  This feature removes a limitation which some people felt was a
  problem for some usage scenarios.

Version 2.4.0-pre1, 2002-08-12
- The kernel driver has an initial implementation of a new
  remote-control API for virtual per-process perfctrs.
  A monitor process may access a target process' perfctrs via
  open(), mmap(), and ioctl() on the target's /proc/pid/perfctr.
  For open() and ioctl(), the monitor must hold the target under
  ptrace ATTACH control. The user-space library and examples have
  not been updated for the new API.

Version 2.3.12, 2002-08-12
- Updated patch kit for the 2.4.19 final kernel.
- Spelling fix in INSTALL.
- Minor driver code size reduction on uniprocessor kernels.

Version 2.3.11, 2002-07-21
- Interrupt-mode performance counters now have accumulated sums.
  The library procedures vperfctr_read_pmc() and vperfctr_read_ctrs()
  can now retrieve the sums of interrupt-mode counters.
- Corrected the name of K7 event 0x42 to DATA_CACHE_REFILLS_FROM_L2.

Version 2.3.10, 2002-07-19
- Added a script, `update-kernel', to simplify the process of
  patching the kernel source code. See INSTALL for details.
- The counter and control registers are now cleared when the
  driver is idle. This should allow the counter hardware to
  power down when not used, especially on P4.
- Some Pentium MMX and Pentium Pro processors have an erratum
  which causes System Management Mode to shut down if user-space
  has been granted access to the RDPMC instruction. The driver
  now avoids granting RDPMC access on the affected processors.
  The user-space library makes this change transparent.
- New CPU type code for Model 2 Pentium 4s, due to a few but
  significant changes between Model 0 and 1 and Model 2 CPUs.  
- The driver now supports Replay Tagging on the Pentium 4.
  The perfex program has been updated to allow users to specify
  values to store in PEBS_ENABLE and PEBS_MATRIX_VERT.
  For example, the following command could be use to count the
  number of L1 cache read misses on a Pentium 4:

  perfex -e 0x0003B000/0x12000204@0x8000000C --p4pe=0x01000001 --p4pmv=0x1 some_program

  Explanation: IQ_CCCR0 is bound to CRU_ESCR2, CRU_ESCR2 is set up
  for replay_event with non-bogus uops and CPL>0, and PEBS_ENABLE
  and PEBS_MATRIX_VERT are set up for the 1stL_cache_load_miss_retired
  metric. Note that bit 25 is NOT set in PEBS_ENABLE.

Version 2.3.9, 2002-06-27
- Pentium 4 bug fix: An error in older revisions of Intel's IA32
  Volume 3 manual caused the driver to program the wrong control
  register in a few cases, affecting uses of the uop_type event.
  Revision -007 of Intel document #245472 corrects the error,
  and the driver has been updated accordingly.

Version 2.3.8.1, 2002-06-27
- Regenerated the patch file for RedHat's 2.4.18-5 kernel. The
  patch file in 2.3.8 only contained an error message from 'diff'.

Version 2.3.8, 2002-06-26
- Added counter overflow interrupt support for Intel P4.
- New kernel support: standard kernels 2.2.21 and 2.4.19-rc1,
  and RedHat kernels 2.2.19-7.0.16, 2.4.9-34, and 2.4.18-5.
- API changes: Removed unused and obsolete fields from the vperfctr
  state and control objects. Added fields to perfctr_cpu_control
  to enable future support for P4 replay tagging events.
  Incremented the vperfctr mmap() binary layout magic number.
- Changed the "make" rule in INSTALL to build "vmlinux" before
  "modules". This change is needed for RedHat kernels.
- Added build of a shared (.so) version of the user-space library.
- When changing a process' vperfctr control data, the TSC sum
  is now preserved if the next control state includes the TSC.
  It used to be preserved only if both the previous and next states
  included the TSC. The difference matters when a running TSC is
  stopped and then restarted by a STOP;CONTROL command sequence.
- Driver cleanups. Merged P6 and K7 driver procedures.

Version 2.3.7, 2002-04-14
- Added Pentium 4 support to examples/perfex/. The full
  syntax of an event specifier is now "evntsel/aux@pmc".
  All three components are 32-bit processor-specific numbers,
  written in decimal or hexadecimal notation.

  "evntsel" is the primary processor-specific event selection
  code to use for this event. This field is mandatory.

  "/aux" is used when additional event selection data is
  needed. For the Pentium 4, "evntsel" is put in the counter's
  CCCR register, and "aux" is put in the associated ESCR
  register. No other processor currently needs this field.

  "@pmc" describes which CPU counter number to assign this
  event to. When omitted, the events are assigned in the
  order listed, starting from 0. Either all or none of the
  event specifiers should use the "@pmc" notation.
  Explicit counter assignment via "@pmc" is required on
  Pentium 4 and VIA C3 processors.

  As an example, the following command could be used to count
  the number of retired instructions on a Pentium 4:

  perfex -e 0x00039000/0x04000204@0x8000000C some_program

  Explanation: Program IQ_CCCR0 with required flags, ESCR select 4
  (== CRU_ESCR0), and Enable. Program CRU_ESCR0 with event 2
  (instr_retired), NBOGUSNTAG, CPL>0. Map this event to IQ_COUNTER0
  (0xC) with fast RDPMC enabled.
- The driver now permits cascading counters on the Pentium 4.
- Preliminary driver infrastructure to support ptrace(ATTACH)
  for a future remote-control interface to per-process counters.
- Driver and patch kit updated for the APIC interrupt entries
  changes in kernel 2.5.8-pre3.

Version 2.3.6, 2002-03-21
- Fixed a problem with caused "BUG! resuming non-suspended perfctr"
  warnings when running PAPI's test cases with a DEBUG-compiled
  perfctr driver. There was no actual error, only a mismatch between
  the debug code and the code for changing event selection data.
- Fixed a time-stamp counter accounting error when user-space
  resumed interrupt-mode perfctrs with the VPERFCTR_IRESUME ioctl.

Version 2.3.5, 2002-03-17
- Multiprocessor AMD K7 machines should work now. A bug in current
  2.2/2.4/2.5 kernels prevented correct CPU identification on these
  machines, causing crashes. The driver now works around this bug.
- Added support for the VIA C3 Ezra-T processor.
- Added some support for interrupt-mode counters to the library.
  Cleaned up examples/signal/.
- Added links in OTHER to John Reiser's tsprof and Troy Baer's
  lperfex tools.

Version 2.3.4, 2002-01-23
- More detailed installation instructions in INSTALL.
- Experimental support for at-retirement counting on Pentium 4.
  Updated examples/global/ to count FLOPS on Pentium 4.
- Fixed uses of __FUNCTION__ to comply with changes in GCC 3.0.3.

Version 2.3.3, 2001-12-31
- Added support for the 2.4.16 and 2.4.17 kernels.
- SMP bug fixed: if a process using interrupt-mode counters migrates
  from CPU1 to CPU2 and then back to CPU1, then it could incorrectly
  resume the stale state cached in CPU1.
- P6 bug fixed: when a process resumed, it could inadvertently activate
  a suspended interrupt-mode counter belonging to the previous process
  using the performance counters.
- Pentium 4 bug fixed: could fail to update the control registers on a
  context switch.
- Removed the "pmc_map[] must be the identity function" restriction
  from P6 and K7.
- Updated examples/global/global.c: added Pentium 4 support
  (preliminary, counting MIPS not FLOPS), corrected VIA C3 handling,
  and corrected 32-bit integer overflow problems affecting fast CPUs.
- Removed perfctr_evntsel_num_insns() from the library: the interface
  could not support the Pentium 4. examples/self/self.c now does the
  setup all by itself, with Pentium 4 support.

Version 2.3.2, 2001-11-19
- Corrected an error in the driver's mapping from counter number
  to control registers on the Pentium 4. Counter 17 didn't work,
  and attempts to use it could have disturbed other counters as well.
- Fixed a minor omission in the Pentium 4 initialisation code.

Version 2.3.1, 2001-11-06
- New patches for kernels 2.2.20, 2.4.9-13 (RedHat 7.2 update),
  2.4.13-ac5, and 2.4.14. Minor cleanup in the P4 driver code.

Version 2.3, 2001-10-24
- Added support for multiple interrupt-mode virtual perfctrs
  with automatic restart. Updated the signal delivery interface
  to pass a bitmask describing which counters overflowed; the
  siginfo si_code is now fixed as SI_PMC_OVF (fault-class).
- Added EXPORT_NO_SYMBOLS to init.c, for compatibility with
  announced changes in modutils 2.5.
- Patch set updated for recent kernels.

Version 2.2, 2001-10-09
- Added preliminary Pentium 4 support to the driver, but only for
  the simple basic features. The example applications have not
  been updated, since I don't yet have a Pentium 4 for testing.

Version 2.1.4, 2001-09-30
- Added -l/-L (--list/--long-list) options to examples/perfex
  to have it list the current CPU's available events.
- Added 'set of events' descriptors for each supported CPU type
  to the library, and changed it to be a standard archive file.
- Performance counter interrupts now work in standard kernels,
  starting with kernel 2.4.10. Updated README.

Version 2.1.3, 2001-09-13
- Fixed a problem which prevented compiling the driver as a module
  in kernels older than 2.2.20pre10 if CONFIG_KMOD was disabled.
- Cleaned up command-line option processing in perfex. It now
  uses the GNU getopt library and accepts long option names.
- Fixed a typo in perfctr-events.tab (P6's INST_DECODED was
  misspelled as INST_DECODER), and updated/corrected several
  unit mask descriptions.
- Replaced most occurrences of "VIA Cyrix III / C3" with "VIA C3".

Version 2.1.2, 2001-09-05
- Added MODULE_LICENSE() tag, for compatibility with the tainted/
  non-tainted kernel stuff being put into 2.4.9-ac and modutils.
- The VIA C3 should be supported properly now, thanks to tests run
  by Dave Jones @ SuSE which clarified some aspects of the C3.
- Minor bug fix in the perfctr interrupt assembly code.
  (Inherited from the 2.4 kernel. Fixed in 2.4.9-ac4.)

Version 2.1.1, 2001-08-28
- Fixed a bug in the finalise backpatching code, which could
  cause a kernel hang in some configurations.
- Updated for kernel 2.4.9-ac3, which required changes to
  avoid conflicts in the %cr4 access methods.
- Preliminary code to detect Pentium 4 processors with Performance
  Monitoring features available.
- Minor %cr4-related cleanups.
- Minor documentation updates.
- Added a link in OTHER to Curtis Janssen's vprof tool.

Version 2.1, 2001-08-19
- Fixed a call backpatching bug, caused by an incompatibility
  between the 2.4 and 2.2 kernels' xchg() macros.
- Fixed a bug where an attempt to use /proc/<pid>/perfctr on an
  unsupported processor would cause a (well-behaved) kernel oops.
- The WinChip configuration option has been removed, and WinChip
  users should instead pass "notsc" as a boot-time kernel parameter.
  This permitted a cleanup of the driver and the 2.4 kernel patches,
  at the expense of more code in the 2.2 kernel patches to implement
  "notsc" support.

Version 2.0.1, 2001-08-14
- The "redirect call" backpatching code in the low-lever driver
  has been changed again. The change in 2.0-pre6 was insufficient,
  due to a nasty SMP-related erratum in all Intel P6 processors.
- Added support for 2.4.8/2.4.8-ac1 kernels.
- Removed an obsolete check from the WinChip support code.

Version 2.0, 2001-08-08
- Resurrected partial support for interrupt-mode virtual perfctrs.
  virtual.c permits a single i-mode perfctr, in addition to TSC
  and a number of a-mode perfctrs. BUG: The i-mode PMC must be last,
  which constrains CPUs like the P6 where we currently restrict
  the pmc_map[] to be the identity mapping. (Not a problem for
  K7 since it is symmetric, or P4 since it is expected to use a
  non-identity pmc_map[].)
- Bug fix in perfctr_cpu_update_control(): start by clearing cstatus.
  Prevents a failed attempt to update the control from leaving the
  object in a state with old cstatus != 0 but new control.

Version 2.0-pre7, 2001-08-07
- Updated user-space library:
  * Coding tweaks to attempt to make gcc (various versions) generate
    better code. (Not entirely successful. May have to resort to
    hand-written assembly code.)
  * New vperfctr_read_ctrs() sampling procedure.
  * New perfctr_print_info() helper procedure.
- Updated example applications:
  * Use the library's perfctr_print_info() for consistent output.
  * Counts are now printed in decimal, not hex.
  * 'perfex' now checks for data layout mismatch when the child
    process' virtual perfctr is mmap:ed into user space.
  * 'self' uses the new vperfctr_read_ctrs() sampling procedure.
  * 'signal' compiles again.
- Cleaned up the driver's debugging code.
- Internal driver rearrangements. The low-level driver (x86) now handles
  sampling/suspending/resuming counters. Merged counter state (sums and
  start values) and CPU control data to a single "CPU state" object.
  This simplifies the high-level drivers, and permits some optimisations
  in the low-level driver by avoiding the need to buffer tsc/pmc samples
  in memory before updating the accumulated sums (not yet implemented).
- Removed WinChip "fake TSC" support. The user-space library can now
  sample with slightly less overhead on sane processors.

Version 2.0-pre6, 2001-07-27
- Sampling bug fix for SMP. Normally processes are suspended and
  resumed many times per second, but on SMP machines it is possible
  for a process to run for a long time without being suspended.
  Since sampling is performed at the suspend and resume actions,
  a performance counter may wrap around more than once between
  sampling points. When this occurs, the accumulated counts will
  be highly variable and much lower than expected.
  A software timer is now used to ensure that sampling deadlines
  aren't missed on SMP machines.
- Bug fix in the x86 "redirect call" backpatching routine.
- Bug fix in the internal debugging code (CONFIG_PERFCTR_DEBUG).
- Minor performance tweak for the P5/P5MMX read counters procedures.
- To avoid undetected data layout mismatches, the user-space library
  now checks the data layout version field in a virtual perfctr when
  it is being mmap:ed into the user's address space.

Version 2.0-pre5, 2001-06-11
- Structure layout changes to reduce sampling overheads.
  The ABI changed slightly, but I hope this is the last such
  change for some time.
- Fixed two bugs related to the interaction of interrupt-mode
  perfctrs and the lazy EVNTSEL MSR update cache in the low-level
  driver. (Interrupt-mode support is still disabled in the
  high-level drivers, however.)
- Fixed a bug in examples/perfex where it forgot to initialise
  the pmc_map[] control field. This caused the driver to refuse
  attempts to use more than one counter. The current fix is
  for P6/K7 only; a general "fixup" procedure will be added to
  the user-space library later.
- Added a CONFIG_PERFCTR_DEBUG option to enable some internal
  consistency checking in the driver. This is a temporary
  measure intended to help debug two open problem reports.

Version 2.0-pre4, 2001-04-30
- Some module usage accounting changes which should make automatic
  module loading and unloading more robust in 2.2 kernels.
- Internal cleanups and a few minor bug fixes.
- Some API naming changes, and O_CREAT can now be used to control
  whether opening /proc/self/perfctr should create and attach
  a vperfctr or not.
- The user-space library has been updated for the new API.
  pmc_map[] is used to map from "virtual counter i" to an actual
  PMC index to be used by RDPMC -- the VIA Cyrix III / C3 is now
  able to sample in user-space even though it has no PMC(0).
  The layout of pmc_map[] is CPU-specific; see x86.c for details.
  Since TSC sampling is specified explicitly now, perfctr_cpu_nrctrs()
  has been changed to return the number of performance counters
  _excluding_ the TSC.
- The example programs have been updated for the new API, with the
  exception of signal.c which is still non-functional.
- The perfex.c example works better now that the API has a consistent
  one-evntsel-per-counter model even for Intel P5-like CPUs.
- The global.c example has been fixed to not cause a division by zero
  on WinChip CPUs lacking a working TSC.

Version 2.0-pre3, 2001-04-17
- Preliminary implementation of the new data structures and API
  is in place. The user-space components have not yet been updated.
  Interrupt-mode virtual perfctrs have been disabled pending
  completion of necessary CPU driver support.
- Now uses "VIA_C3" as the family name for both the VIA C3 and
  the slightly older VIA Cyrix III processors. "VIA_CYRIX_III"
  was just too clumsy and confusing. (It's not a Cyrix at all.)
- Fixed etc/perfctr-events.tab to make Cyrix' event codes agree
  with reality rather than with the Cyrix manuals. The manuals
  ignore the fact that the 7-bit event codes are stored in two
  distinct bit fields in the CESR.

Version 2.0-pre2, 2001-04-07
- Removed automatic inheritance of per-process virtual perfctrs
  across fork(). Unless wait4() is modified, it's difficult to
  communicate the final values back to the parent: the now
  abandoned code did this in a way which made it impossible
  to distinguish one child's final counts from another's.
  Inheritance can be implemented in user-space anyway, so the
  loss is not great. The interface between the driver and the rest
  of the kernel is now smaller and simpler than before.
- Dropped support for kernels older than 2.2.16.
- Preliminary support for the VIA C3 processor.

Version 2.0-pre1, 2001-03-25
- First round of API and coding changes/cleanups for version 2.0.
  The driver version in struct perfctr_info is now a string instead
  of the previous major/minor/micro version number mess.
- Internal cleanups and minor fixes.
- Fixed an include file problem which made some C compilers (not gcc)
  fail when compiling user-space applications using the driver.

Version 1.9, 2001-02-13
- Fixed compilation problems for 2.2 and SMP kernels.
- Corrected VIA Cyrix III support. The "VIA Cyrix III" product
  has apparently used two distinct CPUs. Initial CPUs were a
  Cyrix design (Joshua) while current CPUs apparently are a
  Centaur design (Samuel). Added support for "Samuel" CPUs.
- Two corrections in the K7 perfctr event list.
- Small tweaks to vperfctr interrupt handling.
- Added preliminary interrupt-mode support for AMD K7.

Version 1.8, 2001-01-23
- Added preliminary interrupt-mode support to virtual perfctrs.
  Currently for P6 only, and the local APIC must have been enabled.
  Tested on 2.4.0-ac10 with CONFIG_X86_UP_APIC=y.
  When an i-mode vperfctr interrupts on overflow, the counters are
  suspended and a user-specified signal is sent to the process. The
  user's signal handler can read the trap pc from the mmap:ed vperfctr,
  and should then issue an IRESUME ioctl to restart the counters.

Version 1.7, 2001-01-01
- Updated patches for kernels 2.2.18 and 2.4.0-prerelease.
- Removed the need to ./configure the library before building it.
- /dev/perfctr is now only used for global-mode perfctrs.
- Library API changes to reflect new /dev/perfctr semantics.
- Backported /proc/self/perfctr to kernels 2.2.13-2.2.17.
- /proc/self/perfctr is now mandatory for virtual perfctrs.
- Fixed a VIA Cyrix III CPU detection bug.
- Fixed a minor problem in the 2.4 patch to drivers/Makefile.
- Changed examples/global/global.c to count MFLOP/s instead
  of branches and branch prediction hits/misses.

Version 1.6, 2000-11-21
- Updated for kernels 2.4.0-test11 and 2.2.18pre22.
- Preliminary implementation of /proc/self/perfctr as a more direct
  way of accessing one's virtual perfctrs. (If this works out,
  the /dev/perfctr interface to vperfctrs will be phased out.)
  The driver can still be built as an autoloadable module.
  (For now, only supported in 2.2.18pre22 and 2.4.0-test11.)
- Some user-space library API changes to accommodate /proc/self/perfctr.
- The per-process virtual TSC is no longer restarted from zero
  when the perfctrs are reprogrammed, which allows it to be used
  as a high-res per-process clock (i.e. gethrvtime()).
- Rewrote the `command' example application to use perfctr inheritance
  instead of the recently removed "remote control" facility.
- WinChip documentation updates and corrections.

Version 1.5, 2000-09-03
- The virtual perfctr "remote control" facility has been removed,
  resulting in major simplifications in the driver.
  Since version 1.3 of the driver, the most common application of
  the remote control facility (to record events from unmodified
  applications) can be more easily implemented using the perfctr
  inheritance facility (perfctr control setup is inherited from parent
  to child processes, and a child's event counts are propagated back
  to its parent). Removing the remote control facility simplified
  resource management and eliminated a number of concurrency issues.
- Code cleanups. Dropped support for intermediate 2.3 and early 2.4
  kernels. The code now supports kernels 2.2.xx and 2.4.0-test7 or
  later only (via a 2.4-on-2.2 simulation layer).
- A number of changes to the user-space library. The API is now thread-
  safe (the library has no internal state), and the naming scheme has
  been simplified due to the removal of the remote-control facility.
  The zero-syscall perfctr sampling code has been rewritten and should
  be faster and more robust. (It fixed a sampling problem one user had
  on a 4-way MP box.)

Version 1.4, 2000-08-11
- Updates to comply which changes in 2.4.0-test kernels, in particular
  concerning module owner and use count tracking, and the Virtual File
  System (VFS) infrastructure.
- A bug which prevented reclaiming VFS resources (dentries and
  inodes) allocated to virtual perfctrs has been fixed. This bug
  affected both 2.2.x and 2.4.0-test kernels.

Version 1.3, 2000-06-29
- Implemented inheritance for per-process virtual perfctrs.
  This means that a child's performance-monitoring counts are
  attributed to its parent, similarly to how time is handled.
  The parent must have active perfctrs before forking off the child,
  and neither parent nor child must have reprogrammed its perfctrs
  when the child exits, otherwise no propagation occurs.
  Threads created implicitly by the kernel via request_module()
  are protected from perfctr inheritance.
- Added an example program to illustrate inheritance.
- Fixed two small buglets in the driver.
- Preliminary changes to make the user-space library thread-safe.
- Updated driver for kernel 2.4.0-test2.
- The driver now exports the CPU clock frequency to user-space,
  to enable mapping of accumulated TSC counts to actual time.
- Clarified that this package is licensed under the GNU LGPL.

Version 1.2, 2000-05-24
- Added support for kernels 2.2.16pre4 and 2.3.99-pre9-5.
- Added support for generic x86 processors with a time-stamp counter
  but no performance-monitoring counters. By using the driver to
  virtualise the TSC, accurate cycle-count measurements are now
  possible on PMC-less processors like the AMD K6.
- Fixed a bug in the WinChip driver.
- Miscellaneous code cleanups.

Version 1.1, 2000-05-13
- Support for Linux kernels 2.2.14, 2.2.15 and 2.3.99-pre8.
- Changes to the driver and user-space library to reduce the
  number of getpid() calls. (Suggested by Ulrich Drepper.)
- Added support for the VIA Cyrix III processor.
- Performance improvements in the x86 driver interface.
- Some code cleanups.

Version 1.0, 2000-01-31
- Support for Linux kernels 2.3.41, 2.2.15pre5, and 2.2.14.
- Code cleanups in order to handle drivers for non-x86 processors.
- Changes to the x86 drivers to reduce cache footprint and
  sampling overhead. (Sample low 32 bits of counters, but
  maintain 64-bit sums.)

Version 0.11, 2000-01-30
- Support for Linux kernels 2.3.41 and 2.2.14.
- Minor code cleanups and fixes.
- The CR4.PCE flag is now globally enabled on x86, except for
  those processors which does not support it. This is done in part
  to reduce the overhead of virtualising the performance counters,
  but it is also necessary due to changes in kernel 2.3.40.

Version 0.10, 2000-01-23
- Support for Linux kernels 2.3.40 and 2.2.14.
- Global-mode performance counters are now implemented.
- Added hardware support for the WinChip 3 processor.
- More source code reorganisation.

Version 0.9, 2000-01-02
- Support for Linux kernels 2.3.35, 2.2.14pre18, and 2.2.13.
- The driver can now be built as a module.
- The driver now installs itself as the /dev/perfctr device instead
  of adding a system call.
- Significant source code reorganisation.

Version 0.8, 1999-11-14
- Support for Linux kernels 2.3.28 and 2.2.13.
- Major updates to reduce the overhead of maintaining virtual
  performance-monitoring counters:
    - The control registers are cached and updated lazily.
    - The counter registers are no longer written to.
    - Unused counters are no longer manipulated at all.
      (This matters especially for the AMD K7.)
- Reduced the process scheduling overhead for processes not
  using performance-monitoring counters.
- Minor code cleanups, bug fixes, and documentation updates.

Version 0.7, 1999-10-25
- Support for Linux kernels 2.3.22 and 2.2.13.
- Improved performance. (Uses RDPMC instead of RDMSR when possible.)
- The AMD K7 Athlon should now work properly.
- User-space now uses mmap() to read the kernel's accumulated
  counter state.
- The driver is now invoked via a new sys_perfctr() system call,
  instead of abusing prctl().
- The kernel patch has been cleaned up. The "#ifdef CONFIG_PERFCTR"
  mess has been eliminated.

Version 0.6, 1999-09-08
- Version 0.6 with support for Linux kernels 2.3.17 and 2.2.12.
- Preliminary support for AMD Athlon added.

Version 0.5, 1999-08-29
- Support for Linux kernel 2.3.15.
- The user-space buffer is updated whenever state.status changes,
  even when a remote command triggers the change.
- Reworked and simplified the high-level code. All accesses
  now require an attached file in order to implement proper
  accounting and syncronisation. The only exception is UNLINK:
  a process may always UNLINK its own PMCs.
- Fixed counting bug in sys_perfctr_read().
- Improved support for the Intel Pentium III.
- Another WinChip fix: fake TSC update at process resume.
- The code should now be safe for 'gcc -fstrict-aliasing'.

Version 0.4, 1999-07-31
- Support for Linux kernel 2.3.12.
- Implemented PERFCTR_ATTACH and PERFCTR_{READ,CONTROL,STOP,UNLINK}
  on attached perfctrs. An attached perfctr is represented as a file.
- Fixed an error in the WinChip-specific code.
- Perfctrs now survive exec().

Version 0.3, 1999-07-22
- Support for Linux kernel 2.3.11.
- Interface now via sys_prctl() instead of /dev/perfctr.
- Added NYI stubs for accessing other processes' perfctrs.
- Moved to dynamic allocation of a task's perfctr state.
- Minor code cleanups.

Version 0.2, 1999-06-07
- Support for Linux kernel 2.3.5.
- Added support for WinChip CPUs.
- Restart counters from zero, not their previous values. This
  corrected a problem for Intel P6 (WRMSR writes 32 bits to a PERFCTR
  MSR and then sign-extends to 40 bits), and also simplified the code.
- Added support for syncing the kernel's counter values to a user-
  provided buffer each time a process is resumed. This feature, and
  the fact that the driver enables RDPMC in processes using PMCs,
  allows user-level computation of a process' accumulated counter
  values without incurring the overhead of making a system call.

Version 0.1, 1999-05-30
- First public release for Linux kernel 2.3.3.
