3.8 C
New York

Report card for your LLMs

This blog post focuses on new features and improvements. For a comprehensive list, including bug fixes, please see the release notes.
Introduced a module for evaluating large language models (LLMs) [Developer Preview]
Fine-tuning large language models (LLMs) is a powerful strategy that lets you take a pre-trained language model and further train it on a specific dataset or task to adapt it to that particular domain or application.
After specializing the model for a specific task, it’s important to evaluate its performance and assess its effectiveness when provided with real-world scenarios. By running an LLM evaluation, you can gauge how well the model has adapted to the target task or domain.
After fine-tuning your LLMs using the Clarifai Platform, you can simply use this LLM Evaluation module to evaluate the performance of LLMs against standardized benchmarks alongside custom criteria, gaining deep insights into their strengths and weaknesses.
Follow this documentation, which is a step-by-step guide on how to fine-tune and evaluate your LLMs.

Here are some key features of the module:

Evaluate across 100+ tasks covering diverse use cases like RAG, classification, casual chat, content summarization, and more. Each use case provides the flexibility to choose from relevant evaluation classes like Helpfulness, Relevance, Accuracy, Depth, and Creativity. You can further enhance the customization by assigning user-defined weights to each class.
Define weights on each evaluation class to create custom weighted scoring functions. This lets you measure business-specific metrics and store them for consistent use. For example, for RAG-related evaluation, you may want to give zero weight to Creativity and more weights for Accuracy, Helpfulness, and Relevance.
Save the best performing prompt-model combinations as a workflow with a single click for future reference.

Published new models

Wrapped Claude 3 Opus, a state-of-the-art, multimodal language model (LLM) with superior performance in reasoning, math, coding, and multilingual understanding.
Wrapped Claude 3 Sonnet, a multimodal LLM balancing skills and speed, excelling in reasoning, multilingual tasks, and visual interpretation.
Clarifai-hosted Gemma-2b-it, a part of Google DeepMind’s lightweight, Gemma family LLM, offering exceptional AI performance on diverse tasks by leveraging a training dataset of 6 trillion tokens, focusing on safety and responsible output.
Clarifai-hosted Gemma-7b-it, an instruction fine-tuned LLM, lightweight, open model from Google DeepMind that offers state-of-the-art performance for natural language processing tasks, trained on a diverse dataset with rigorous safety and bias mitigation measures.
Wrapped Google Gemini Pro Vision, which was created from the ground up to be multimodal (text, images, videos) and scale across a wide range of tasks.
Wrapped Qwen1.5-72B-Chat, which leads in language understanding, generation, and alignment, setting new standards in conversational AI and multilingual capabilities, outperforming GPT-4, GPT-3.5, Mixtral-8x7B, and Llama2-70B on many benchmarks.
Wrapped DeepSeek-Coder-33B-Instruct, a SOTA 33 billion parameter code generation model, fine-tuned on 2 billion tokens of instruction data, offering superior performance in code completion and infilling tasks across more than 80 programming languages.
Clarifai-hosted DeciLM-7B-Instruct, a state-of-the-art, efficient, and highly accurate 7 billion parameter LLM, setting new standards in AI text generation.

Added a notification for remaining time for free deep training

Added a notification at the upper-right corner of the Select a model type page about the number of hours left for deep training your models for free.

Made enhancements to the Python SDK

Updated and cleaned the requirements.txt file for the SDK.
Fixed an issue where a failed training job led to a bug when loading a model in the Clarifai-Python client library, and concepts were replicated when their IDs did not match.

Made enhancements to the RAG (Retrieval Augmented Generation) feature

Enhanced the RAG SDK’s upload() function to accept the dataset_id parameter.
Enabled custom workflow names to be specified in the RAG SDK’s setup() function.
Fixed scope errors related to the user and now_ts variables in the RAG SDK by correcting their definition placement, which was previously inside an if statement.
Added support for chunk sequence numbers in the metadata when uploading chunked documents via the RAG SDK.

Added feedback form

Added feedback form links to the header and listings pages of models, workflows, and modules. This enables registered users to provide general feedback or request a specific model.

Added a display of inference pricing per request

The model and workflow pages now display the price per request for both logged-in and non-logged-in users.

Implemented progressive image loading for images

Progressive image loading displays low-resolution versions of images initially, gradually replacing them with higher-resolution versions as they become available. It solves page load issues and preserves image sharpness.

Replaced spaces with dashes in IDs

When updating User, App, or any other resource IDs, spaces will be replaced with dashes.

Updated links

Updated the text and link for the Slack community in the navbar’s info popover to ‘Join our Discord Channel.’ Similarly, updated the link similar to it at the bottom of the landing page to direct to Discord.
Removed the “Where’s Legacy Portal?” text.

Display name in PAT toast notification

We’ve updated the account security page to display a PAT name instead of PAT characters in the toast notification.

Improved the mobile onboarding flow

Made minor updates to mobile onboarding.

Improved sidebar appearance

Enhanced sidebar appearance when folded in mobile view.

Added an option to edit the scopes of a collaborator

You can now edit and customize the scopes associated with a collaborator’s role on the App Settings page.

Enabled deletion of associated model assets when removing a model annotation

Now, when deleting a model annotation, the associated model assets are also marked as deleted.

Improved model selection

Made improvements to the model selection drop-down list on the workflow builder.

Related articles

Recent articles