Mathematics of Machine Learning: Data, Algorithm, Model, and Causal Learning

Azamat Abdoullaev 02/08/2021

There are 4 concepts that are at the core of machine learning (ML): data, algorithm, a model, and statistical learning.

Machine learning is about designing algorithms that automatically extract valuable information from statistical data. The emphasis here is on correlations and “automatism”, i.e., machine learning is concerned about general-purpose methodologies that can be applied to many datasets, while producing something that is meaningful.

Since machine learning is inherently data driven, data is at the core data of machine learning. The goal of machine learning is to design general-purpose methodologies to extract valuable patterns from data, ideally without much domain-specific expertise.

A “machine learning algorithm” means a system that makes predictions based on input data, named as predictors. Or, the “machine learning algorithm” means a system that adapts some internal parameters of the predictor to perform well on future unseen input data.

A model is said to learn from data if its performance on a given task improves after the data is taken into account. The goal is to find good models that generalize well to yet unseen data, which we may care about in the future.

Learning can be understood as a learning way to automatically find patterns and structure in data by optimizing the parameters of the model.

Mathematics for Machine Learning

Statistical Machine Learning involves mathematical fields, such as linear algebra, function analysis, or calculus, geometry, discrete mathematics, statistics, probability theory and mathematical programming.

But such a machine learning is missing the fundamentals of dataset theory, a foundational system for the whole of mathematics, as well as computer science and data-driven AI.

The Set Theory for Machine Learning

An AI dataset is not a set or multiset, list, bunch, bag, heap, sample, weighted set, collection, and suite, but a poset.

A set is merely gathering together into a whole of definite, distinct, well-defined objects, the set's elements or members, as data, numbers, people, letters of the alphabet, other sets, and so on.

Machine data/representation/concept/category learning/formation is the machine learning task of learning a function that maps an input to an output based on example input-output pairs, as the feature vector space X and its label space Y. And a learning algorithm seeks a function f, where X is the input space and Y is the output space, of some space of possible functions F, called the hypothesis space.

It infers a function from labeled training data consisting of a set of training examples, where each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal). A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples. An optimal scenario will allow for the algorithm to correctly determine the class labels for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way.

A partially ordered set (poset) formalizes and generalizes the concept of an ordering, sequencing, or arrangement of the elements of a set.

Formally, a partial order is any binary relation that is reflexive, antisymmetric, and transitive.

A poset consists of a set together with a binary relation indicating that, for certain pairs of elements in the set, one of the elements precedes the other in the ordering. Again, there may be pairs of elements for which neither element precedes the other in the poset. Partial orders thus generalize total orders, in which every pair is comparable.

A poset can be visualized through its Hasse diagram, which depicts the ordering relation.

The Hasse diagram of the set of all subsets of a three-element set {x, y, z}, ordered by inclusion.

Standard examples of data posets include:

The real numbers ordered by the standard less-than-or-equal relation ≤ (a totally ordered set as well).
The set of subsets of a given set (its power set) ordered by inclusion.
The set of sequences ordered by subsequence, and the set of strings ordered by substring.
The set of natural numbers equipped with the relation of divisibility.
The vertex set of a directed acyclic graph ordered by reachability.
The set of subspaces of a vector space ordered by inclusion.
The set of events in special relativity and general relativity, where for two events X and Y, X ≤ Y if and only if Y is in the future light cone of X. An event Y can only be causally affected by X if X ≤ Y.
A collection of people ordered by genealogical descendancy. Some pairs of people bear the descendant-ancestor relationship, but other pairs of people are incomparable, with neither being a descendant of the other.

Now, a data set (or dataset) is defined as a collection of data, listing values for each of the variables for each member of the data set. In the case of tabular data, a data set corresponds to database tables, where every column represents a particular variable or feature and attribute, and each row corresponds to a given record of the data set in question.

But data sets can also consist of a collection of documents or files. Data set is the unit to measure the information released in a public open data repository. The European Open Data portal mentions > half a million data sets.

So, no consensus about it, if not take a data poset as the common background.

Conclusion

Overall, there are two generations of machine learning: statistical machine learning (SML) and causal machine learning (CML).

There are 5 concepts behind causal machine learning: reality, data, causal algorithm, a causal model, and causal learning.

Science and Technology 21, New Physica by Azamat Abdoullaev - Read the e-book version here.

Share this article

Leave your comments

Post comment as a guest

Comments

Comments

No comments found

Azamat Abdoullaev

Tech Expert

Azamat Abdoullaev is a leading ontologist and theoretical physicist who introduced a universal world model as a standard ontology/semantics for human beings and computing machines. He holds a Ph.D. in mathematics and theoretical physics.

Mathematics of Machine Learning: Data, Algorithm, Model, and Causal Learning