Skip to content

Question in testing latency  #3

@alkane7

Description

@alkane7

Hi! Thank you for your great work!
We are trying to test the latency of MagicPIG. Also, we adjust the hyperparameter K and L to K=10 and L=100 in order to find a better latency.
However, we find that TTFT and TPOT seem high on InfiniteBench.
Our partial results are as follows:

        code_debug    code_run 
TTFT(s)    630.37       140   
TPOT(s)    1.246       0.84
for k in range(GEN_LEN):
  st = time.time()
  input_ids = logits.argmax(dim=-1)
  logits = llm.inference(input_ids=input_ids, position_ids=position_ids[:,PREFIX_LEN + k:PREFIX_LEN + k + 1])
  output.append(input_ids.item())
  en = time.time()
  total_decode_time.append(en-st)
  if input_ids.item() in config["eos"]:
      break
TPOT = sum(total_decode_time) / len(total_decode_time)

We would like to know the reason of high latency and if there is any error in our implementation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions