Google AI introduces AdaTape: A new AI approach with a transformer-based architecture that enables dynamic computation in neural networks through adaptive tape tokens

By | August 11, 2023

While humans possess the ability to adapt their thinking and responses based on different situations or conditions, neural networks, while incredibly potent and intricately designed, are limited by fixed functions and inputs. They consistently perform the same function regardless of the nature or complexity of the samples presented.

To solve this problem, the researchers use adaptivity (a powerful paradigm, as it not only gives practitioners flexibility regarding the downstream use of these models, but can also serve as a powerful inductive bias to solve certain challenging classes of problems). It refers to the ability of a machine learning system to adjust its behavior in response to the change in the scenario or environment.

While conventional neural networks have a fixed function and computational capacity, a model with adaptive and dynamic computation modulates the computational budget it dedicates to processing each input depending on the complexity of the input. Adaptive computation in neural networks is appealing for two reasons. First, they provide an inductive bias that allows different numbers of computational steps for different inputs, which can be crucial for solving arithmetic problems that require modeling hierarchies of different depths. Second, it facilitates the ability to adjust the cost of inference through greater flexibility offered by dynamic computation, as these models can be adjusted to spend more FLOPs processing a new input.

Therefore, Google researchers have introduced a new model that uses adaptive computing, called AdaTape. AdaTape is very simple to implement as it directly injects adaptivity into the input sequence instead of the model depth and is also very accurate. AdaTape uses an adaptive tape reading mechanism to determine different tape tokens to add to each input based on the complexity of the input.

AdaTape is a transformer-based architecture that uses a dynamic set of tokens to create an elastic input sequence. AdaTape uses the adaptive function. It also uses a vector representation to represent each input to dynamically select a sequence of variable-sized tape tokens.

Build your personal brand with Taplio! 🚀 The first AI-powered tool to grow on LinkedIn (sponsored)

AdaTape uses a “tape bank” to store all candidate tape tokens that interact with the model through the adaptive tape read mechanism to dynamically select a sequence of variable-sized tape tokens. The researchers used two different methods to create the tape bank: an input-driven bank (the input-driven bank extracts a bank of tokens from the input, while using a different approach than the original model tokenizer to map the raw input to a sequence of input tokens) and a learnable bank (a more general method of generating the tape bank using a set of learnable vectors as tape tokens).

Then the tape tokens are added with the original input and sent to the transformer. Then the two feed-forward networks are used. One is used for original input and the other for all ribbon tokens. The researchers observed slightly better quality using separate feed-forward networks for input and tape tokens.

The researchers tested the usefulness of AdaTape on many parameters. They found that it outperforms all baselines that incorporate repetition in its input selection mechanism, providing an inductive bias that allows implicit maintenance of a counter, which is impossible in standard transformers. The researchers also evaluated AdaTape on image classification tasks. They tested AdaTape on ImageNet-1K and found that in terms of quality and cost trade-off, AdaTape far outperforms the alternative adaptive transformer baselines.

Check out Paper and Google Blog. All credit for this research goes to the researchers on this project. Also, don’t forget to participate our 28k+ ML SubReddit, 40k+ Facebook Community, Discord channeland Email newsletterwhere we share the latest AI research news, cool AI projects and more.

Rachit Ranjan is a consultant trainee at MarktechPost. He is currently pursuing his B.Tech from Indian Institute of Technology(IIT) Patna. He is actively shaping his career in artificial intelligence and data science and is passionate and dedicated to exploring these fields.

🔥 Use SQL to Predict the Future (Sponsored)

#Google #introduces #AdaTape #approach #transformerbased #architecture #enables #dynamic #computation #neural #networks #adaptive #tape #tokens

Leave a Reply

Your email address will not be published. Required fields are marked *