AlphaFold: AI, Aesthetics and the power of shapes
Form and function go together in protein folding, as elsewhere
A version of this essay has been published by Open Magazine at https://openthemagazine.com/essay/the-shape-of-covid/ and here are a few excerpts. Please click on the link above for the full essay.
2020 may well turn out to have been a watershed year in biochemistry and biology. Despite the failure of the establishment in finding a quick cure for the Covid pandemic, there was the Nobel Prize won by Emmanuelle Charpentier and Jennifer Doudna for CRISPR-Cas9, a gene editing mechanism with immense possibilities.
Then there is the amazing news in biology of the apparent success of DeepMind’s AlphaFold2 in predicting the shape of proteins with unparalleled accuracy. This has been a quest in biochemistry for at least fifty years, and the fact that a machine-learning algorithm has been able to do it with better than 90% accuracy is truly impressive.
…
That means the possible number of proteins is enormous, purely from the mathematics. DNA has only 4 types of basic nucleotides, and that is enough for massive complexity, as seen in the human genome project. What is truly remarkable is that it is the shape of the protein that matters, not necessarily the specific amino acid in the sequence. More on that later.
…..
The thing that we all need to keep in mind is that AI/ML is a performing monkey. It may go through the motions and execute what appear to be wondrous feats, but it simply has no understanding of what is going on: it is an idiot savant, if you want to be more charitable. It is about syntax, not about semantics.
….
Second, the question of aesthetics. There has always been a philosophical question as to whether beauty matters. Even though most engineers and scientists have been trained to think that that is a frivolous question best left to dreamy artists and philosophers, the fact is that elegance and, yes, beauty matter in almost everything.
…
Beauty, apparently, is not optional, but integral.
In the Indian tradition of aesthetics and rasas, the importance of structure in invoking certain emotions is well articulated; indeed, one of the cornerstones of Carnatic music is its mathematical precision (this is true of western classical music as well).
….
Another example is the great length to which traditionally oral renditions of scriptures went to preserve exactness. Pada paatha, using hand mudras as error-correcting codes for ensuring absolutely correct transmission of the Rg Veda, is a tradition from Kerala.
….
The fractional dimensions of these patterns may have a relationship to theoretical insights, such as Subhash Kak’s conjecture that gravity can be explained if the universe is e-dimensional, where e = 2.71828…., the irrational number called Euler’s constant.
What does all this mean in the context of protein folding? It turns out that a protein, which, as mentioned earlier, is a long chain of amino-acids, can be folded into an astronomical number of possible shapes when it is created.
…
Apparently the number of permutations for how the protein can be folded is of the order of 10^300, enormously greater than the number of atoms in the universe, which is supposed to be around 10^80. That would make the task of computing the permutations essentially NP-complete, that is, not computable using brute-force methods. You need certain heuristics or rules of thumb to reduce the universe of possibilities to a manageable number.
…
We are not quite there yet, but the excitement over Alphafold2 is because it might be able to identify precise counter-weapons, based on the shape of the enemy’s own weapons, that can fend off the enemy. Instead of trial and error, if Alphafold2 is able to narrow the field down to a handful of possible drugs and vaccines, that would be a major boon.
None of this may happen for years, but it is a promising way forward. There are further complications: it is also necessary to consider how the protein molecule interacts with each other and with other molecules, say water, in its vicinity. There are 180 million protein sequences known to scientists, but only some 170,000 of them have had their structures determined so far. Automating the task will help enormously.
The drug discovery time can be reduced; in a future pandemic, researchers may find an antidote among known drugs and vaccines in days, instead of spending months inventing new things and rushing them to market barely tested. That would reduce the risks for humanity, and would be a great contribution to public health.
Please read the full essay at https://openthemagazine.com/essay/the-shape-of-covid/