# Example of how to use Llama-3.1-8B-Instruct API in python

## With a simple HTTP client (requests)

First install the *requests* library:

```bash
pip install requests
```

Next, export your access token to the *OVH_AI_ENDPOINTS_ACCESS_TOKEN* environment variable:

```bash
export OVH_AI_ENDPOINTS_ACCESS_TOKEN=<your-access-token>
```

*If you do not have an access token key yet, follow the instructions in the [AI Endpoints – Getting Started](https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-endpoints-getting-started?id=kb_article_view&sysparm_article=KB0065401).*

Finally, run the following Python code:

```python
import os
import requests
 
headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {os.getenv('OVH_AI_ENDPOINTS_ACCESS_TOKEN')}",
}
 
# With /chat/completions
url = "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1/chat/completions"
payload = {
    "max_tokens": 512,
    "messages": [
        {
            "content": "Explain gravity to a 6 years old",
            "role": "user"
        }
    ],
    "model": "Llama-3.1-8B-Instruct",
    "temperature": 0,
}
response = requests.post(url, json=payload, headers=headers)
if response.status_code == 200:
    # Handle response
    response_data = response.json()
    # Parse JSON response
    choices = response_data["choices"]
    for choice in choices:
        text = choice["message"]["content"]
        # Process text and finish_reason
        print(text)
else:
    print("Error:", response.status_code, response.text)

# With /responses
url = "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1/responses"
payload = {
    "max_output_tokens": 512,
    "input": [
        {
            "content": "Explain gravity to a 6 years old",
            "role": "user"
        }
    ],
    "store": False,
    "model": "Llama-3.1-8B-Instruct",
    "temperature": 0,
}
response = requests.post(url, json=payload, headers=headers)
if response.status_code == 200:
    # Handle response
    response_data = response.json()
    text = response_data["output"][0]["content"][0]["text"]
    print(text)
else:
    print("Error:", response.status_code, response.text)
```

## With the Python OpenAI library

The Llama-3.1-8B-Instruct API is compatible with the OpenAI specification.

First install the *openai* library:

```bash
pip install openai
```

Next, export your access token to the *OVH_AI_ENDPOINTS_ACCESS_TOKEN* environment variable:

```bash
export OVH_AI_ENDPOINTS_ACCESS_TOKEN=<your-access-token>
```

Finally, run the following Python code:

```python
import os

from openai import OpenAI

url = "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1"

client = OpenAI(
    base_url=url,
    api_key=os.getenv("OVH_AI_ENDPOINTS_ACCESS_TOKEN")
)

def chat_completion(new_message: str) -> str:
    history_openai_format = [{"role": "user", "content": new_message}]

    return client.chat.completions.create(
        model="Llama-3.1-8B-Instruct",
        messages=history_openai_format,
        temperature=0,
        max_tokens=1024
    ).choices.pop().message.content

def responses(new_message: str) -> str:
    response = client.responses.create(
        model="Llama-3.1-8B-Instruct",
        input=new_message,
        temperature=0,
        max_output_tokens=1024,
        store=False
    )
    return response.output[0].content[0].text

if __name__ == '__main__':
    # With chat completion endpoint
    print(chat_completion("Explain gravity for a 6 years old"))
    # With responses endpoint
    print(responses("Explain gravity for a 12 years old"))
```

## Model rate limit

When using AI Endpoints, the **following rate limits apply**:

- **Anonymous**: 2 requests per minute, per IP and per model.
- **Authenticated with an API access key**: 400 requests per minute, per Public Cloud project and per model.

If you exceed this limit, a **429 error code** will be returned.

If you require higher usage, please **[get in touch with us](https://help.ovhcloud.com/csm?id=csm_get_help)** to discuss increasing your rate limits.

## Going Further

Want to explore the full capabilities of the LLM API? Dive into our dedicated [Structured Output](https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-endpoints-structured-output?id=kb_article_view&sysparm_article=KB0071891) and [Function Calling](https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-endpoints-function-calling?id=kb_article_view&sysparm_article=KB0071907) guides.

For a broader overview of AI Endpoints, explore the full [AI Endpoints Documentation](https://help.ovhcloud.com/csm/en-gb-documentation-public-cloud-ai-and-machine-learning-ai-endpoints?id=kb_browse_cat&kb_id=574a8325551974502d4c6e78b7421938&kb_category=ea1d6daa918a1a541e11d3d71f8624aa).

Reach out to our support team or join the [OVHcloud Discord](https://discord.gg/ovhcloud) #ai-endpoints channel to share your questions, feedback, and suggestions for improving the service, to the team and the community.