Task planners play an important role in automating complex tasks, such as scheduling, resource allocation, and decision-making. However, evaluating the performance of task planners is challenging, as it requires access to a large number of diverse problems and a way to measure their performance accurately.
The Need for an Automated Benchmark
Manual benchmarking of task planners is time-consuming and error-prone, and it can be difficult to compare the performance of different planners on a fair basis. An automated benchmark addresses these challenges by providing a platform for testing planners on a large number of problems and measuring their performance using a standardized set of metrics.
The English-Language Task Planner Benchmark
The English-language Task Planner Benchmark (ELTPB) is an automated benchmark for English-language task planners. It consists of a collection of 100 diverse task planning problems, each of which is represented in natural language. The problems cover a wide range of domains, including scheduling, resource allocation, and decision-making.
Problem Structure
Each problem in the ELTPB is represented as a tuple of three elements:
* **Task definition:** A natural language description of the task to be performed.
* **State description:** A natural language description of the initial state of the world.
* **Goal description:** A natural language description of the goal state that the planner should achieve.
Performance Metrics
The performance of a task planner on the ELTPB is measured using the following metrics:
* **Success rate:** The percentage of problems that the planner can solve successfully.
* **Solution quality:** The average quality of the solutions produced by the planner, as measured by the number of steps in the solution and the amount of resources consumed.
* **Planning time:** The average amount of time taken by the planner to solve a problem.
Benefits of the ELTPB
The ELTPB provides several benefits over manual benchmarking of task planners:
* **Automated:** The ELTPB automates the process of testing planners on a large number of problems, reducing the time and effort required for benchmarking.
* **Objective:** The ELTPB provides a standardized set of metrics for measuring planner performance, ensuring that comparisons between planners are fair and accurate.
* **Comprehensive:** The ELTPB includes a wide range of problems, covering a variety of domains and task types. This ensures that the benchmark is challenging for even the most sophisticated planners.
* **Language-independent:** The ELTPB is designed to be language-independent, making it possible to use the benchmark to evaluate planners that process tasks in any natural language.
Conclusion
The English-Language Task Planner Benchmark is a valuable resource for researchers and practitioners in the field of task planning. The benchmark provides a standardized and automated way to evaluate the performance of task planners, making it easier to identify the strengths and weaknesses of different planners and to compare their performance on a fair basis.
References
[1] B. Bonet and H. Geffner, Planning as Heuristic Search: New Results, in Proceedings of the 5th International Conference on Artificial Intelligence Planning Systems (AIPS), 1998, pp. 360-372.
[2] C. Chakrabarti and S. Kambhampati, Plan-as-you-go: A Technique for incremental planning, in Proceedings of the 15th International Conference on Artificial Intelligence (IJCAI), 2017, pp. 4176-4183.
[3] S. Kambhampati, A Unified Framework for Representing and Reasoning about Incomplete Knowledge in Planning, in Proceedings of the 11th National Conference on Artificial Intelligence (AAAI), 1993, pp. 1004-1010.
Kind regards B. Guzman