-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Description
Hi! Thank you for your great work!
We are trying to test the latency of MagicPIG. Also, we adjust the hyperparameter K and L to K=10 and L=100 in order to find a better latency.
However, we find that TTFT and TPOT seem high on InfiniteBench.
Our partial results are as follows:
code_debug code_run
TTFT(s) 630.37 140
TPOT(s) 1.246 0.84
for k in range(GEN_LEN):
st = time.time()
input_ids = logits.argmax(dim=-1)
logits = llm.inference(input_ids=input_ids, position_ids=position_ids[:,PREFIX_LEN + k:PREFIX_LEN + k + 1])
output.append(input_ids.item())
en = time.time()
total_decode_time.append(en-st)
if input_ids.item() in config["eos"]:
break
TPOT = sum(total_decode_time) / len(total_decode_time)
We would like to know the reason of high latency and if there is any error in our implementation.
Metadata
Metadata
Assignees
Labels
No labels