TransWikia.com

A global mathematics library

MathOverflow Asked on November 3, 2021

Just for personal interest, I am not (yet) professionally involved in it. My question is about the state of arts in digitalization of mathematics and to what extent it is possible and reasonable.

There are different levels of digitalizations:

  1. OCR scan all historical mathematical texts
  2. organize metadata of references and authors (e.g. as graphs)
  3. extract mathematical objects (like theorems, definitions, etc)
  4. extract proofs and ideas
  5. formalize mathematics such that it can be completely checked by theorem provers

The main effort I found is
https://imkt.org/

Steps 3/4 and 5 may be of independent interest and should be understood more parallel than chronological. Point 5 is more interesting in having (error-free) formalized math. It should also be allowed to choose different foundations of mathematics and the possibility to switch in between them. Point 3/4 is more interesting for a researcher that wants all references for a definition, a theorem, a keyword. It would be a wonderful source for doing data analysis of mathematical knowledge (historical, social, semantical, etc). In contrast to 5 it can contain errors and speculations. The main interest is in identifying and referencing mathematical objects over all the produced texts in history of math.

My question is:

The goal of https://imkt.org/ is huge but when looking at its first projects it looks (sry) a little bit disappointing. The main focus (also of other literature that I scanned through) lies on connecting the existing databases and languages, i.e. of theorem provers, computer algebra systems (and maybe wikis). I understand that different applications in math demand different systems (e.g. integer series http://oeis.org/ should also be part of it?) Can or shouldn’t there be one system that contains everything that can be accessed (and is stored not just referenced!) through the same system? Are my dreams of such a system over the top?

  • One of the largest issues is the copyright of the big publishers.
    More and more is going in direction open math. Until then it is
    unclear to what extent a library can be complete (gaps are somehow
    missing the point of this system).

  • The other issue is the efficiency in producing the content extraction
    and to advance it by advertising the advantages of such a library to
    mathematicians such that it will move itself at some point.

There have been many efforts in the past that were abandoned again or here for years (like Mizar) but far from being known and used in daily mathematics.

3 Answers

There's a lot of work lately on formalizing large parts of classical mathematics using proof assistants, see for example the Lean prover and the accompanying library mathlib. You can see an overview of what has been formalized here; it includes a lot of classical abstract algebra and real analysis, although there's also a lot of undergraduate math they haven't formalized yet which they list here. If you think that's all "trivial", there's ongoing work to formalize the proof of the independence of the continuum hypothesis, the proof that it's possible to evert the sphere, etc.

An interesting facet of this project is that all the proofs are kept under a version control system and you can see the entire history of every proof. It's common for the initial proof of some result to be very long and verbose, only for people to find ways to shorten it or make it more elegant later -- for example, von Neumann's proof of the Radon-Nikodym theorem. Where before you might have had to trace through the citations of loads of historical papers that your library might not even have, now you can see exactly how this process played out by just doing git blame.

Answered by korrok on November 3, 2021

On my opinion, 2-5 are non-realistic (and hardly needed), while 1 is indeed desirable and important. To be sure, a substantial progress was made in the recent years towards 1. But it still remains a remote goal.

David White mentioned the main obstacle: the existing shameful copyright system. But there are several other obstacles. The number of math periodicals is really enormous, many were owned by universities and various societies in the past, and there is no real will to digitalize all these obscure publications. There are many old publications of various societies in most European countries which are still not digitalized. Big companies probably see no perspective of profit in this (and they are probably right), and volunteers have more important things to do.

As I understand copyright does not apply to 18-19 centuries math journals. Still, there is little interest in their digitalization. Few big journals which still continue, digitalized the old issues (and claim copyright for this digitalization). But no one seems to be interested in those journals which do not continue.

In many cases, even when a digitalization is available, the quality is too poor to read or print, or search. The old volumes of Comptes rendus are available, but try to find anything in them. Sometimes they show you "digitalized" pages on the screen but do not allow to download the pdf file of the article. So you are supposed to print a large paper one page at a time, if allowed to print at all.

The result is easy to predict: with the current demise of the usual libraries, most this mathematics will be simply lost. And no realistic remedy is visible. Well, this is not the first time that such thing happens: by various estimates about 90% of Hellenistic mathematics and science were lost. (And practically 100% of Greek pre-Hellenistic).

Of course one can argue that "the most important" works are preserved. For example Euclid is preserved but all his predecessors are not. Perhaps there is some truth in this. (Imagine that of all 20 century mathematics only Bourbaki is preserved:-)

Answered by Alexandre Eremenko on November 3, 2021

None of your five levels of digitalization are at all within reach. Let's take just the first one. As you say, copyright is a huge obstacle. Consider what happened when the same idea was attempted by Google (resulting in Google Books), as explained wonderfully in this article in the Atlantic. Larry Page (as in, PageRank) set up a tremendous effort to get books from libraries and scan them en masse with OCR technology to make them searchable.

After doing this for some time, publishers filed suit against Google. Google hired a massive legal team because it cared so much about creating such a library. Google managed to bring the publishers onboard with the idea, pointing out that it would allow publishers to sell out-of-print works. Unfortunately, this raised another issue, like a sad game of whack-a-mole, and the Department of Justice got involved to prevent Google from having a de-facto monopoly on out-of-print books. In the end, the Judge prevented the settlement that Google and the publishing companies had agreed to.

The same things would happen in the math world. The point is: you have a bunch of authors who died a long time ago and can't give consent to update their old publishing contracts. You have publishers who won't let go of content unless they can profit on it. And you don't have any organization large enough to be the central clearing house to connect customers who want to read these works with the places they can view and buy them. The problem only gets harder every year, because we produce more and more mathematics all the time (and OCR for math is not great, throwing a wrench into your third idea). My opinion: if Google couldn't pull together a global library, despite all the effort they put into this problem, there's no hope for the rest of us.

Answered by David White on November 3, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP