It seems that quite a few people in the ML community - even those who ought to know better - think that transformers are some kind of “memorization” machines. I have no idea how this impression has been formed, could be something to do with their use for LLMs. The idea behind transformers is simple, albeit can be technically tricky to demonstrate - they are just the latest development in representation learning (RL). RL aims to find a better representation of data, which can indeed lead to better ML modeling. For instance, both autoencoders and deep learning use RL, and have had enormous success in recent years. Transformers have seen their use with all forms of data - text, images, tabular, etc. Their success has varied, but they have definitely been a valuable addition to the ML practitioners toolbar. If they were really all about “memorization”, they would be much worse on the new and unseen data than other approaches. However, that’s not what I’ve seen. In recent years transformers have been a regular component of the winning solutions in many Kaggle competitions. I myself have used them for one gold medal solution.
Transformers work, they are great ML tool, and will continue to add value to the ML research and applications. You may legitimately question how they are applied, and if they are the optimal approach for any given problem, but there is nothing intrinsically wrong with them.