What is Grok and how different is it from ChatGPT?

by

in


Elon Musk on Saturday announced the launch of a new large language generative AI model — Grok, which is said to be modelled after “Hitchhiker’s Guide to the Galaxy” and is intended to “answer almost anything and, far harder, even suggest what questions to ask!”

Grok will be incorporated within X, previously known as Twitter and has been designed to answer questions with a sense of humour. In fact, the company advocates not to use Grok if you hate humour.

Just released Grokhttps://t.co/e8xQp5xInk

— Elon Musk (@elonmusk) November 5, 2023

With access to real-time knowledge of the world with the help of the X platform, Grok is even capable of answering “spicy questions” that most other AI models reject. This four-month-old generative AI model with 2 months of training is still in the beta phase and the company claims to improve it in the coming days.

Purpose of Grok AI

According to xAI, Grok has been created to assist humanity to understand and gain knowledge. It is powered by Grok-1 LLM, which has been developed over a period of four months. The prototype Grok-0 was trained with 33 billion parameters, which is said to be as capable as Meta’s LLaMA 2, which supports 70 billion parameters.

Grok capabilities

In terms of benchmarks, the Grok-1 achieves 63.2% on the HumanEval coding task and 73% on MMLU. While it is still not as capable of something like GPT-4, xAI claims that, within a limited time, the company has been able to improve the performance of Grok-1 when compared to Grok-0.

Festive offer

According to the benchmark numbers, on GSM8k (Cobbe et al. 2021), a benchmark designed around middle-class math word problems, Grok-1 achieved 62.9 per cent, which is higher than GPT-3.5 and LLaMa 2 but lower than Palm 2, Claude 2, and GPT-4.

The same goes for other benchmarks like MMLU, a benchmark based on multi-choice questions (Hendrycks et al. 2021), HumanEval (Chen et al. 2021), a Python code generation test, and MATH (Hendrycks et al. 2021) a middle school and high school mathematical tests written in LaTeX.

Benchmark Grok-0 (33B) LLaMa 2 70B Inflection-1 GPT-3.5 Grok-1 Palm 2 Claude 2 GPT-4
GSM8k 56.8%
8-shot
56.8%
8-shot
62.9%
8-shot
57.1%
8-shot
62.9%
8-shot
80.7%
8-shot
88.0%
8-shot
92.0%
8-shot
MMLU 65.7%
5-shot
68.9%
5-shot
72.7%
5-shot
70.0%
5-shot
73.0%
5-shot
78.0%
5-shot
75.0%
5-shot + CoT
86.4%
5-shot
HumanEval 39.7%
0-shot
29.9%
0-shot
35.4%
0-shot
48.1%
0-shot
63.2%
0-shot
70%
0-shot
67%
0-shot
MATH 15.7%
4-shot
13.5%
4-shot
16.0%
4-shot
23.5%
4-shot
23.9%
4-shot
34.6%
4-shot
42.5%
4-shot

Similarly, xAI has also hand-graded Grok-1, where, it cleared the 2023 Hungarian national high school finals in mathematics with a C grade (59 per cent), surpassing the performance of Claude 2 (55 per cent), while the GPT-4 scored a B grade with 68 per cent.

These numbers clearly indicate that Grok-1 is already more capable than OpenAI’s GPT-3.5, but not as capable as the latest model GPT-4. The company also claims that Grok-1 despite being trained on less amount of data can surpass models that have been trained on large amounts of data and also require higher computing capabilities.

Example of Grok vs typical GPT, where Grok has current information, but other doesn’t pic.twitter.com/hBRXmQ8KFi

— Elon Musk (@elonmusk) November 5, 2023

Grok-1 has been trained using a custom training and inference stack based on Kubernetes, Rust, and JAX. As Grok has access to the internet with real-time access to the latest information, the company claims that it can “generate false or contradictory information.”

To mitigate these issues in future models, xAI is looking for human feedback, contextual understanding, multimodal capabilities, and Adversarial robustness.

The beta version of Grok is currently available to a limited number of users in the US. In the coming days, the same will be made available for X Premium+ subscribers, which costs Rs 1,300 per month, when subscribed from a desktop.





Source link