Langchain LLama: Number of tokens exceeded maximum context length (512)

hi guys, please take a look. This is how I instantiate the model:

def load_llm():
    llm = CTransformers(
        model="TheBloke/Llama-2-13B-Chat-GGUF",
        model_type="llama",
        max_new_tokens=512 - (MAX_QUERY_TOKENS + 1),
        temperature=0.3,
    )

    return llm

and here is what I get when I post query:

Batches: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 27.05it/s]
Tokenized Query: ['should', 'we', 'per', '##cie', '##ve', 'fear', 'as', 'a', 'threat', '?']
Query Token Count: 10

and here is the response:

2023-09-30 08:46:57 - Number of tokens (546) exceeded maximum context length (512).
2023-09-30 08:46:57 - Number of tokens (547) exceeded maximum context length (512).
2023-09-30 08:46:58 - Number of tokens (548) exceeded maximum context length (512).
Tokenized Response: ['"', 'fear', 'is', 'a', 'low', '-', bla bla bla, ]
Response Token Count: 228

Hi Praxis,

I am also facing issue, did you able to resolve the issue?

Thanks,
Vignesh

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.