Configurations streamline the management of your Gateway, allowing programmatic control over crucial elements such as fallbacks, retries, caching, and more. Each configuration is represented as a JSON object, serving to define routing rules for all incoming requests to your gateway. Multiple configurations can be set up and employed in your requests as needed.

Supported Configurations


Retrieve responses for previously addressed queries directly from the SimpliML cache, avoiding the need to resend them to your model endpoint again. Whether matching exact strings or semantically similar ones, this process ensures requests are fulfilled up to 20 times faster and at a more cost-effective rate. Experience the benefits by leveraging the caching feature in SimpliML.


The Fallback functionality enables you to define a prioritized list of Language Model APIs (LLMs). Should the primary LLM encounter an error or fail to respond, SimpliML seamlessly switches to the next LLM in the specified list. This ensures the resilience and dependability of your application, offering a smooth and reliable user experience.


Effortlessly handle any unsuccessful API requests with automatic retries, allowing up to 5 retry attempts. Our system incorporates an exponential backoff strategy, intelligently spacing out retry attempts to prevent network overload and enhance reliability. Utilize the "retry" parameter of the API to enable this feature seamlessly.

Adding Config to Request

Create the curl Request

To retrieve the details of gravity, issue the following curl command, inserting your JSON-formatted object in -d below.

     -H 'Content-Type: application/json' \
     -H 'Authorization: Bearer $SimpliML_API_KEY' \
     -H 'X-Config: {
        "cache": {
            "enable": true,
            "type": "semantic",
            "mode": "all_model",
            "threshold": 0.90 
        "fallback": [
                "provider": "simpliml",
                "model": "s7e2ca956beb6e87d7dae"
        "retry": {
            "attempts": 1,
            "status_code": [503,429]
     -d '{"messages": [{"role": "system","content": "You are a AI assistant which ans the question in accurate and polite manner"},{"role": "user","content": "what is gravity"}],"repetetion_penalty": 1.0,"model": "s7e2ca956beb6e87d7dae","max_tokens": 200,"top_p": 1,"top_k": 50,"temperature": 0.4,"stream": false}'