using NCD and not, say, alignment, can cope with full genomes and other large data files and thus comes up with a single distance matrix. We perform various experiments on sets of mostly classical pieces given as midi (Musical Instrument Digital Interface) files. Note that the majority of distances bunches in the range.9,1. The placement of the pieces is closer to intuition on a small level (for example, most pairing of siblings corresponds to musical similarity in the sense of the same composer) than on the larger level. For a finite series i 1 a c i, item (ii) is not applicable.

Notation: We typically use the Greek letter to represent a Turing machine T as a partially defined function. Commonly the goal in the quartet method is to find (or approximate as closely as possible) the tree that embeds the maximal number of consistent (possibly weighted) quartet topologies from a given set P Q of quartet topologies (Figure? Next, there is near-perfect regional separation with clear delineation of Japan and the crucial Qinghai, Astrakhan, Mongolia, and Novosibirsk, as well as near-perfect separation of Vietnam and Thailand. In these settings, the direction from the root to the leaves represents an evolution in time, and the assumption is that there is a true tree we have to discover. Most computer users realize that there are freely available programs that can compress text files to about one quarter their original size. In our context, these are called C and. Trees are not flexible enough to represent all realistic scenarios. The NCD investigations of the previous chapters focused on using data compressors to compress data in files. In this light we may consider NCD as a particularly convenient and sometimes highly intelligent feature-extraction (or dimensionality reduction) technique suitable for drop-in replacement in a number of larger automatic learning systems.

