Benchmark ai

#BENCHMARK AI CODE#

#BENCHMARK AI CODE#

Microsoft will release the 1.5-billion-parameter DeBERTa model and the source code to the public.

The model also sits at the top of the GLUE benchmark rankings with a macro-average score of 90.8. The significant performance boost makes the single DeBERTa model surpass the human performance on SuperGLUE for the first time in terms of macro-average score (89.9 versus 89.8), and the ensemble DeBERTa model sits atop the SuperGLUE benchmark rankings, outperforming the human baseline by a decent margin (90.3 versus 89.8). Microsoft recently updated the DeBERTa model by training a larger version that consists of 48 Transformer layers with 1.5 billion parameters.

Since its release in 2019, top research teams around the world have been developing large-scale pretrained language models (PLMs) that have driven striking performance improvement on the SuperGLUE benchmark. To get the right answer, the model needs to understand the causal relationship between the premise and those plausible options. Given the premise “the child became immune to the disease” and the question “what’s the cause for this?,” the model is asked to choose an answer from two plausible candidates: 1) “he avoided exposure to the disease” and 2) “he received the vaccine for the disease.” While it is easy for a human to choose the right answer, it is challenging for an AI model. Take the causal reasoning task (COPA in Figure 1) as an example. The benchmark consists of a wide range of NLU tasks, including question answering, natural language inference, co-reference resolution, word sense disambiguation, and others. Natural language understanding (NLU) is one of the longest running goals in AI, and SuperGLUE is currently among the most challenging benchmarks for evaluating NLU models.