Show value of an importable object
Like `typing._eval_type`, but lets older Python versions use newer typing features.
HuggingFace community-driven open-source library of evaluation
Safely evaluate AST nodes without side effects
A simple, safe single expression evaluator library.
Safe, minimalistic evaluator of python expression using ast module
Validation and secure evaluation of untrusted python expressions
Microsoft Azure Evaluation Library for Python
Framework for evaluating stochastic code execution, especially code making use of LLMs
Universal library for evaluating AI models
The LLM Evaluation Framework
A framework for evaluating language models
Evalica, your favourite evaluation toolkit.
Testing framework for sequence labeling
A getattr and setattr that works on nested objects, lists, dicts, and any combination thereof without resorting to eval
MS-COCO Caption Evaluation for Python 3
LLM Evaluations
EvalScope: Lightweight LLMs Evaluation Framework
Provides Python bindings for popular Information Retrieval measures implemented within trec_eval.
Limited evaluator
Open-source evaluators for LLM applications
"EvalPlus for rigourous evaluation of LLM-synthesized code"
Python Mathematical Expression Evaluator
A library for providing a simple interface to create new metrics and an easy-to-use toolkit for metric computations and checkpointing.
The WxO evaluation framework
an AutoML library that builds, optimizes, and evaluates machine learning pipelines using domain-specific objective functions
A poker hand evaluation and equity calculation library
Faster interpretation of the original COCOEval
Use EvalAI through command line interface
Evaluation tools for the SIGSEP MUS database
AlpacaEval : An Automatic Evaluator of Instruction-following Models
Common metrics for common audio/music processing tasks.
evalutils helps users create extensions for grand-challenge.org
A custom Streamlit component to evaluate arbitrary Javascript expressions.
A framework for evaluating language models - packaged by NVIDIA
Evaluating and scoring financial data
Provides Python bindings for popular Information Retrieval measures implemented within trec_eval.
Open-source evaluators for LLM agents
In-loop evaluation tasks for language modeling
Backwards-compatibility package for API of trulens_eval<1.0.0 using API of trulens-*>=1.0.0.
eval-mm is a tool for evaluating Multi-Modal Large Language Models.
Package for fast computation of BSS Eval metrics for source separation
A lightweight and configurable evaluation package
Serialization based on ast.literal_eval
clusteval is a python package for unsupervised cluster validation.
EM algorithms for integrated spatial and spectral models.