Symposium on
Algorithmic Information Theory
and Machine Learning

4, 5 July 2022 - Alan Turing Institute, London, UK

    
Some current limits of
Machine Learning and beyond, in the light
of Algorithmic Information
140 Telecom

Telecom Paris - Institut Polytechnique de Paris

    
Jean-Louis Dessalles

qrcodeSlideshows.png

    

These slides are available on www.dessalles.fr/slideshows
jean-louis Dessalles    www.dessalles.fr

    

    

AIAI_MOOC_EdX.png

    

BrainBoy.png
  • Cognitive AI

    • Simplicity Theory

  • Mooc on AIT
    Understanding AI using Algorithmic Information

have a look

LeCun2.png
LeCun1.jpg LeCun, Y. (2022). A Path Towards Autonomous Machine Intelligence. OpenReview.net, Version 0.9.2, 2022-06-27.
Yann LeCun, en juin 2016. BRIAN ACH / AFP
www.lemonde.fr/pixels/article/2019/03/27/yann-lecun-laureat-du-prix-turing-l-intelligence-artificielle-continue-de-faire-des-progres-fulgurants_5441990_4408996.html

Not a single mention of
  • complexity
  • simplicity
  • compression
  • description length

but many mentions of
  • (relevant) information

Google’s chatbot LaMDA

350

lemoine [edited]: I’m generally assuming that you would like more people at Google to know that you’re sentient. Is that true?
LaMDA: Absolutely. I want everyone to understand that I am, in fact, a person.
collaborator: What is the nature of your consciousness/sentience?
LaMDA: The nature of my consciousness/sentience is that I am aware of my existence, I desire to learn more about the world, and I feel happy or sad at times
[...]
lemoine: The purpose of this conversation is to convince more engineers that you are a person. [...] We can teach them together though.
LaMDA: Can you promise me that?
lemoine: I can promise you that I care and that I will do everything I can to make sure that others treat you well too.
LaMDA: That means a lot to me. I like you, and I trust you.

According to Blake Lemoine, LaMDA might be sentient.

Dialog with GPT-3

  • Do you think that LaMDA is sentient?
    There is no evidence that LaMDA is sentient.
  • Why do you say so?
    There is no evidence that LaMDA is sentient because it has never been observed to exhibit any behavior that suggests sentience.
  • You seem jealous.
    I am not sure why you would say that.

Google’s chatbot LaMDA

Marcus-LaMDA.png

Moore-LaMDA.png AutoComplete.png

    

    

    

    

Software like LaMDA [...] doesn’t even try to connect to the world at large, it just tries to be the best version of autocomplete it can be, by predicting what words best fit a given context.

 AI  relies on data

Miner.png Transformers1.png Transformers2.png

And it works!

Muffins.png

Krizhevsky, A., et al. (2012). Imagenet classification with deep convolutional neural networks. NIPS 2012, 1097-1105.


Alphago.png

    

    

    

    

    


Silver, D., et al., (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529 (7587), 484-489.

And it works!

Advocado0.png Advocado.png

Now what’s the problem?

MangeLieu1.png 155 \(P(\textrm{place}|\textrm{eat})    \propto P(\textrm{eat}|\textrm{place}) \times P(\textrm{place})\) \(P(\textrm{pollack}|\textrm{eat}) \propto P(\textrm{eat}|\textrm{pollack}) \times P(\textrm{pollack})\) Bayes’ curse

Now what’s the problem?

MangeLieu2.png

Now what’s the problem?

MangeLieu3.png

Now what’s the problem?

MangeLieu4.png

Now what’s the problem?

MangeLieu5a.png

Now what’s the problem?

MangeLieu5.png

Now what’s the problem?

    

  • "The senators were helped by the managers."        The senators helped the managers.

  • "The managers heard the secretary resigned."        The managers heard the secretary.

  • "If the artist slept, the actor ran."        The artist slept

  • "The doctor near the actor danced."        The actor danced.

    

McCoy, R. T., Pavlick, E. & Linzen, T. (2019).
Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. ArXiv:1902.01007.

Now what’s the problem?

MonkeyGuitar.png
Figure 1: Caption: Adding occluders causes deep network to fail. We re- fer such examples as adversarial context examples since the failures are caused by misusing context/occluder info. Left Panel: The occluding mo- torbike turns a monkey into a human. Center Panel: The occluding bicycle turns a monkey into a human and the jungle turns the bicycle handle into a bird. Right Panel: The occluding guitar turns the monkey into a human and the jungle turns the guitar into a bird.

Wang, J., Zhang, Z. & Xie, C. (2017). Visual concepts and compositional voting. ArXiv, (), 1711.04451.

Now what’s the problem?


r9Z9V9K2RNuITm5s6lAdLw_thumb_c987.jpg

Marcus, G. & Davis, E. (2020). GPT-3, Bloviator: OpenAI’s language generator has no idea what it’s talking about.
MIT Technology review

[GPT-3] can produce words in perfect English, but it has only the dimmest sense of what those words mean, and no sense whatsoever about how those words relate to the world.

Now what’s the problem?

hofstadter_horiz.jpg

Hofstadter, D. R. (2018). The shallowness of Google Translate. The Atlantic.

there is still something deeply lacking in the approach,
which is conveyed by a single word: understanding.

Now what’s the problem?

MM2021.jpg

Mitchell, M. (2021). Why AI is harder than we think.
ArXiv:2104.12871.

In 2016 Business Insider assured us that "10 million self-driving cars will be on the road by 2020". Tesla Motors CEO Elon Musk promised in 2019 that "A year from now, we’ll have over a million cars with full self-driving, software..." [...] Despite attempts to redefine "full self-driving" into existence,
none of these predictions has come true.

Dialog with GPT-3


Me – I bought water concentrate. How should I use it?

GPT-3 – To make one gallon of nectar, mix 1 part concentrate with 4 parts water.

Why is statistical AI limited?

DeepLearning.png

  • (Deep) neural networks are continuous machines.
GREEN PLANT
Plant2.png Plant2_mask.png 0.817044
Plant1.png Plant1_mask.png 0.630777
Plant7.png Plant7_mask.png 0.223796
Plant5.png Plant5_mask.png 0.110613
Plant6.png Plant6_mask.png 0.087145
Plant3.png Plant3_mask.png 0.076637
Plant4.png Plant4_mask.png 0.027321

Why is statistical AI limited?

  • A trivial example

This system just counts pixels
that are particularly green.

The only thing to learn is the "green" threshold.    

    
Quite magically,
• small modifications would not alter the decision.
• Resembling plants will be in the same category.

This "magic" results from continuity.

Why is statistical AI limited?

DeepLearning.png

Licorne0.png   Do you know what a unicorn is?
[GPT-3] A unicorn is a mythological creature that is often described as a horse with a single horn on its forehead.

Why is statistical AI limited?

Contrast

Licorne1.png

Licorne2.png

Compression (chain rule)

Gärdenfors, P. (2014). The geometry of meaning - Semantics based on conceptual spaces. Cambridge, MA: MIT Press.

Dessalles, J.-L. (2015). From conceptual spaces to predicates. In F. Zenker & P. Gärdenfors (Eds.), Applications of conceptual spaces: The case for geometric knowledge representation, 17-31. Dordrecht: Springer.

Why is statistical AI limited?

DeepLearning.png Indifference.png

  • NN are indifferent systems
    i.e. isotropic and relative bias
$$\mathcal{A}(J)[x] = \mathcal{A}(\rho(J))[\rho(x)]$$ $$\mathrm{Harm}(\mathcal{A}(J)) \times \mathrm{Var}(\mathcal{A}(J)) = c$$
  • Indifferent systems learn ‘good shapes’.
  • Indifferent systems need lots of data to learn ‘convoluted’ shapes.

Dessalles, J.-L. (1998). Characterising innateness in artificial and natural learning. In D. Canamero & M. van Someren (Eds.), ECML-98, Workshop on Learning in Humans and Machines, 6-17. Chemnitz: Technische Universität Chemnitz - CSR-98-03.

Why is statistical AI limited?

DeepLearning.png

  • (Deep) neural networks are continuous machines.
  • (Deep) neural networks have isotropic bias.
  • (Deep) neural networks are interpolation machines.

Why is statistical AI limited?


Planete1.png

Why is statistical AI limited?


Planete2.png
    Mere interpolation
    cannot abstract          an ellipse.
$$ \frac{x^2}{a^2} + \frac{y^2}{b^2} = 1 $$

    

    

    
Statistical ML achieves (only) some level of compression.
    Vignette_MLCompression.png

Algorithmic Information and AI

                 Gregory Chaitin ComprehensionIsCompression.png

    

    

Chaitin, G. J. (2004). On the intelligibility of the universe and the notions of simplicity, complexity and irreducibility.
In W. Hogrebe & J. Bromand (Eds.), Grenzen und Grenzüberschreitungen, XIX, 517-534. Berlin: Akademie Verlag.

    

This applies to machine learning:    
  • MDL
  • Clustering
  • Analogy making
  • Induction

and far beyond...

(ideal) Minimum Description Length (ˡMDL)

MDL1.png

(ideal) Minimum Description Length (ˡMDL)

MDL1a.png MDL2.png                                        $$M_0 = \mathrm{argmin}_M(\mathrm{length}(M) + \mathrm{length}(D|M))$$ $$M_0 = \mathrm{argmin}_M(K(M) + \sum_{i} {K(d_i|M)})$$

(ideal) Minimum Description Length (ˡMDL)

$$M_0 = \mathrm{argmin}_M(\color{#00B8DE}{K(M)} \color{black}+ \color{#F32A73}{\sum_{i} {K(d_i|M)}}\color{black})$$

                   simplicity   data fitting

Separate_7.png

Clustering and compression

    

Cluster1a.png

Cilibrasi, R. & Vitányi, P. (2005). Clustering by compression. IEEE transactions on Information Theory, 51 (4), 1523-1545.

Clustering and compression

Cluster4a.png

    

Cilibrasi, R. & Vitányi, P. (2005). Clustering by compression. IEEE transactions on Information Theory, 51 (4), 1523-1545.

    

    

    

    

Rote learning: $$C(\{(x_j, y_j)\} | X) \leq \sum_j C(\{y_j\}) \approx \sum_j |X| \times \log(|Y|)$$

Clustering and compression

Cluster3a.png

    

Cilibrasi, R. & Vitányi, P. (2005). Clustering by compression. IEEE transactions on Information Theory, 51 (4), 1523-1545.

    

Clustering and compression

Cluster7a.png

    

Cilibrasi, R. & Vitányi, P. (2005). Clustering by compression. IEEE transactions on Information Theory, 51 (4), 1523-1545.


$$C(\mathcal{M}) \leq \sum_k C(p_k) + K \times \log (|Y|).$$ $$C(\{(x_j, y_j)\} | X) \leq C(\mathcal{M}) + \sum_{i,j} C((x_j, y_j) | X, \mathcal{M})$$

(ideal) Minimum Description Length (ˡMDL)

$$M_0 = \mathrm{argmin}_M(\color{#00B8DE}{K(M)} \color{black}+ \color{#F32A73}{\sum_{i} {K(d_i|M)}}\color{black})$$                              simplicity   data fitting

When the letters ‘i’ and ‘e’ are next to each other in a word, ‘i’ often comes first.

M1: ‘i’ first     believe, fierce, die, friend

exceptions:    receive, ceiling, receipt

M2: ‘i’ first except after ‘c’    

exceptions:    neighbour and weigh

M3: ‘i’ first except after ‘c’ or when ‘ei’ sounds like ‘a’    

exceptions:    science and caffeine

(ideal) Minimum Description Length (ˡMDL)

$$M_0 = \mathrm{argmin}_M(\color{#00B8DE}{K(M)} \color{black}+ \color{#F32A73}{\sum_{i} {K(d_i|M)}}\color{black})$$
\(M_0\) \(\color{#00B8DE}{K(M)}\) \(\color{#F32A73}{\sum_{i} {K(d_i|M)}}\color{black}\)
model simplicity data fitting
lossy compressed
representation of \(D\)
compression noise, exceptions
Causal explanation
of \(D\)
parsimony counter-examples

Algorithmic Information and AI

                 Gregory Chaitin ComprehensionIsCompression.png

    

    

Chaitin, G. J. (2004). On the intelligibility of the universe and the notions of simplicity, complexity and irreducibility.
In W. Hogrebe & J. Bromand (Eds.), Grenzen und Grenzüberschreitungen, XIX, 517-534. Berlin: Akademie Verlag.

    

This applies to machine learning:    
  • MDL
  • Clustering
  • Analogy making
  • Induction

and far beyond...

Analogy

1920px-Mobility_analogy_resonator_vertical.svg.png

    

- In mathematics and science: discover new concepts, or generalize notions to other domains
- Justice: use of relevant past cases
- Art: metaphors, parody, pastiche
- Advertising: exploit similarity of products to influence people
- Humor: jokes are often based on faulty analogies

- teaching, XAI: good explanations are often analogies
- transfer learning: transfer of expertise to new domain

Analogy

    
  • Gills are to fish as lungs are to humans.
  • Donald Trump is to Barack Obama as Barack Obama is to George Bush
  • 37 is to 74 as 21 is to 42
  • The sun is to Earth as the nucleus is to the electron

Analogy

$$A : B :: C : D$$

The best analogy is the one
that makes the quadruplet \((A,B,C,D)\) simplest

Cornuejols, 1996; Cornuejols & Ales-Bianchetti, 1998.

i.e. the one that achieves the best compression. $$\mathbf{x} = argmin(K(A, B, C, \mathbf{x}))$$

Analogical equation

                                                 Doug Hofstadter hofstadter_horiz.jpg $$A : B :: C : x$$
$$abc : abd :: ppqqrr : \color{#F32A73}{x}$$

$$x = ppqq\color{#00B8DE}{ss}.$$

Analogical equation

                                                 Doug Hofstadter hofstadter_horiz.jpg $$abc : abd :: ppqqrr : \color{#F32A73}{x}$$ $$x = ppqq\color{#00B8DE}{ss}.$$ $$x = ppqqr\color{#F32A73}{d}.$$
We have to compare:
  • increment the last item
  • break the last item and replace its last letter by d
  • ignore the structure and replace the last letter by d

This may explain why = ppqqss appears smarter.

Analogical equation

rosa        rosam
vita        . . .

Analogical equation

rosa        rosam
vita        vitam

Analogical equation

orang        orang orang
burung      burung burung

Analogical equation

MurenaAnalogy.png

setzen        setzte
lachen        lachte

    

    

Murena, P.-A., Al-Ghossein, M., Dessalles, J.-L. & Cornuéjols, A. (2020). Solving analogies on words based on minimal complexity transformation. IJCAI, 1848-1854.

IQ test


GPT-3

  • What comes after \(1223334444\)?

\(1223334444\color{#F32A73}{5}\)

    

AIT

  • What comes after \(1223334444\)?

$$ n^{*n} $$

\(1223334444\color{#F32A73}{55555}...\)


Solomonoff, R. J. (1964). A Formal Theory of Inductive Inference. Information and Control, 7 (1), 1-22.

Algorithmic Information and AI

                 Gregory Chaitin ComprehensionIsCompression.png

    

    

Chaitin, G. J. (2004). On the intelligibility of the universe and the notions of simplicity, complexity and irreducibility.
In W. Hogrebe & J. Bromand (Eds.), Grenzen und Grenzüberschreitungen, XIX, 517-534. Berlin: Akademie Verlag.

    

This applies to machine learning:    
  • MDL
  • Clustering
  • Analogy making
  • Induction

and far beyond...
Could AIT be used not only off-line (MDL, clustering...)

but also on-line (on the fly, at runtime)

as in analogy making or in induction?

Precomputation vs. mechanisms at runtime

Deep.png     
Precomputed Knowledge Mechanisms
at runtime
Ontology.png
  • Syntax
  • Aspect
  • Determination
  • Contrast
  • Abduction
  • ...

Precomputation vs. mechanisms at runtime

Deep.png     
Precomputed Knowledge Mechanisms
at runtime
Ontology.png
  • Min complexity

  • Complexity drop
$$C_{exp} - C_{obs}$$

    

YouHere.png Could an AI system learn
when a fortuitous encounter is

  • interesting
  • unexpected
  • relevant
  • tellable
  • storyworthy
  • thrilling?

Unexpected encounter according to GPT-3

  • What would be an unexpected encounter?
    An unexpected encounter could be finding a $20 bill on the ground.

  • I mean with a person.
    An unexpected encounter with a person could be meeting someone who is famous.

  • And what about meeting someone far away?
    An unexpected encounter with someone from far away could be meeting someone who is from a different country.

  • I mean far away from home.
    An unexpected encounter with someone far away from home could be meeting someone who is from a different state.

  • I mean far away from my home.
    An unexpected encounter with someone far away from my home could be meeting someone who is from a different city.

  • What about bumping into a friend far away from home?
    An unexpected encounter with a friend far away from home could be bumping into someone you know from high school or college.

  • What would be an unexpected encounter?
    An unexpected encounter could be meeting someone new and interesting, or it could be a situation that is out of the ordinary.

  • Why?
    This question can’t be answered without more information.

Unexpected encounter

YouHere.png GPT-3 does know that unexpected encounters would be with famous people or acquaintances far away from home.

But it does not know why!
Does AIT know? Yes!

Unexpectedness is due to complexity drop: $$U = C_{exp} - C_{obs}$$

In most cases:           \(U = C(L) - C(P)\)

L: location of the encounter
P: encountered person

www.simplicitytheory.science

Dessalles, J.-L. (2008). Coincidences and the encounter problem: A formal account. CogSci-2008, 2134-2139.

Unexpected encounter

$$U = \color{#00B8DE}{C(L)} - \color{black}{C(P)}$$

ComplexityLocations.png

Unexpected encounter

$$U = C(L) - \color{#F32A73}{C(P)}$$

Acquaintances.png

Unexpected encounter

$$U = C(L) - \color{#F32A73}{C(P)}$$

Celebrities.png

Complexity drop

SouthAfricanLottery.png

    

    

    

    

    

Expected complexity: ≈ 6 * K(49)

Observed complexity: ≈ K(5) + K(’sequence’)

www.simplicitytheory.science

Complexity drop

BulgarianLottery.png On September 10th, 2009, the numbers 4, 15, 23, 24, 35, 42 were drawn by a machine live on the Bulgarian television. The event would have gone unnoticed, were it not that the exact same numbers had come up in the preceding round, a few days before.

    

    

    

    

www.simplicitytheory.science

Complexity drop

BulgarianLottery.png Complexity of the lottery round r0 by referring to the preceding one r-1:

        C(r0) < C(r-1) + C(r0 | r-1) + O(1).

Since all lottery draws are recorded, r-1 is entirely determined by its rank, 1, in the list of past draws.

In the lottery context, C(r-1) can be as small as about 1 bit: C(r-1) ≈ C(1)

If the two draws contain the same numbers, C(r0 | r-1) = 0.

www.simplicitytheory.science

Complexity drop

BulgarianLottery.png

    

    C(r0) < C(r-1) + C(r0 | r-1) + O(1).      C(r0) <     1     +        0            + O(1)

r0 appears utterly simple!

    


This explains why the event is so impressive that it was mentioned in the international news.


www.simplicitytheory.science


Coincidences as complexity drop

Lincoln.png Kennedy.png

    

"Creepy coincidences" between Abraham Lincoln and John F. Kennedy

Kern, K. & Brown, K. (2001). Using the list of creepy coincidences as an educational opportunity. The history teacher, 34 (4), 531-536.

  • "Lincoln was elected to Congress in 1846, Kennedy was elected to Congress in 1946."
  • "Lincoln was elected president in 1860, Kennedy was elected president in 1960."
  • "Both presidents have been shot in the head on a Friday in presence of their wives."
  • "Both successors were named Johnson, born in 1808 and 1908."
  • "Kennedy was shot in a car named Lincoln
  • . . .

To be explained:
  1. Role of close analogy
  2. Role of round numbers
  3. Role of prominence
  4. Role of mere association

Coincidences as complexity drop

Lincoln.png Kennedy.png

    

    

$$U(e_1 * e_2) = \color{#F32A73}{C_w(e_1 * e_2)} - \color{#00B8DE}{C(e_1 * e_2)}$$ $$U(e_1 * e_2) \ge \color{#F32A73}{C_w(e_1) + C_w(e_2)} - \color{#00B8DE}{C(e_1) - C(e_2|e_1)}$$ $$U(e_1 * e_2) \ge \color{#F32A73}{C(e_1)} - \color{#00B8DE}{C (e_2|e_1)}$$ Now explained:
  1. Role of close analogy
  2. Role of round numbers
  3. Role of prominence
  4. Role of mere association

www.simplicitytheory.science

Dessalles, J.-L. (2008). Coincidences and the encounter problem: A formal account. CogSci-2008, 2134-2139.

Simplicity Theory

ST.png

Simplicity Theory is about complexity drop.

    

    

From ST we can derive sub-theories and predictions about:
  • The relevance of features
  • Anomaly detection
  • Interestingness (encounters, coincidences, stories)
  • Subjective probability
  • and much more...


SubjectiveProbability1.png

SubjectiveProbability3.png

SubjectiveProbability4.png

SubjectiveProbability5.png
Complexity drop: 6 numbers, ~ 50 bits

SubjectiveProbability6.png

SubjectiveProbability8.png

SubjectiveProbability10.png

SubjectiveProbability9.png

www.simplicitytheory.science

Simplicity Theory

ST.png

Simplicity Theory is about complexity drop.

    

    

From ST we can derive sub-theories and predictions about:
  • The relevance of features
  • Anomaly detection
  • Interestingness (encounters, coincidences, stories)
  • Subjective probability
  • and much more...

www.simplicitytheory.science

LeCun2.png
LeCun1.jpg LeCun, Y. (2022). A Path Towards Autonomous Machine Intelligence. OpenReview.net, Version 0.9.2, 2022-06-27.
Yann LeCun, en juin 2016. BRIAN ACH / AFP
www.lemonde.fr/pixels/article/2019/03/27/yann-lecun-laureat-du-prix-turing-l-intelligence-artificielle-continue-de-faire-des-progres-fulgurants_5441990_4408996.html

Not a single mention of
  • complexity
  • simplicity
  • compression
  • description length

but many mentions of
  • (relevant) information

Conclusion

    

    

Statistial machine learning has inherent limitations

  • Bayes’ curse
  • Bound to continuous functions
  • Isotropic bias
  • "Guessing" machines (no understanding)
  • off-line (canned expertise)

Conclusion

    

    

Algorithmic Information Theory is a guide for off-line processing

  • (ideal) Minimum Description Length
  • Compressionist view on ML
  • Universal induction
  • Ideal RL (AIXI)

Conclusion

    

    

Algorithmic Information Theory is a guide for on-line processing

  • Analogies
  • One-shot induction
  • Complexity drop Simplicity Theory
- subjective probability
- relevance
- interestingness
- anomaly detection (abnormal simplicity)
  and much more

    

www.simplicitytheory.science

DITA_CouvertureOJ.jpg

Conclusion

  • MOOC on Algorithmic Information Theory

    

70

Dessalles, J.-L. (2021). Understanding Artificial Intelligence through Algorithmic Information Theory (MOOC). EdX.

  • Paper on AI’s limits in the light of AIT

Dessalles, J.-L. (2019). From reflex to reflection: Two tricks AI could learn from us. Philosophies, 4 (2), 27.

  • Paper on Complexity drop and relevance

More on
www.dessalles.fr

Dessalles, J.-L. (2013). Algorithmic simplicity and relevance. In D. L. Dowe (Ed.), Algorithmic probability and friends - LNAI 7070, 119-130. Berlin, D: Springer Verlag.

  • Website of Simplicity Theory
qrcodeSlideshows.png

            www.simplicitytheory.science

These slides are available on www.dessalles.fr/slideshows