!!!! 한 곡 내에서, songform 끼리의 sts 비교 square matrixra
tree-search probability =0.2 / 0.1 → ?
input_text는 lyric의 요약본
ppl은 실제로 generation하는 부분 만 줌.
scratch-back에 대해, 나머지 모델들과 paired t-test 표 넣기
songform 관련 metric: bertscore / sts correlation matrix
제외:
⇒ 18906→18701
Output sample
Full/Phrase/line/ngrams 단위로 나누는게 옳은가?
Main table (sent_emb = full_lyric의 요약본)
Metrics | PPL | syl-distance | syl-acc | sent-emb dist. (with input text) | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Models | Full | Phrase | line | ngrams+word | Full | Phrase | line | ngrams+word | Full | Phrase | line | ngrams+word | Full | Phrase | line | ngrams+word | |
Baselines | Chat GPT | x | x | x | x | ||||||||||||
LLaMa | x | x | x | x | |||||||||||||
Claude 3 | x | x | x | x | |||||||||||||
Proposed | pre-both | ||||||||||||||||
scratch-both |
Main table (sent_emb = full_lyric 자체)
Metrics | PPL | syl-distance | syl-acc | sent-emb dist. (with input text) | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Models | Full | Phrase | line | ngrams+word | Full | Phrase | line | ngrams+word | Full | Phrase | line | ngrams+word | Full | Phrase | line | ngrams+word | |
Baselines | Chat GPT | x | x | x | x | ||||||||||||
LLaMa | x | x | x | x | |||||||||||||
Claude 3 | x | x | x | x | |||||||||||||
Proposed | pre-both | ||||||||||||||||
scratch-both |
Probability of Success within 10 Attempts
sent_emb = summary | Metrics | PPL | prob. of success (in 1st attempt) | prob. of success (in 10th attempt) | avg. regen num (in success case) | syl-distance | syl-acc | sent-emb dist. | bertscore | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Models | Full | Phrase | line | ngrams+word | Full | Phrase | line | ngrams+word | Full | Phrase | line | ngrams+word | sum | ||||||||
vs lyr | |||||||||||||||||||||
pre-both | 18.273 | 44.719 | 42.242 | 15.490 | 0.975 | 0.998 | 1.040 | 0.031 | 0.067 | 0.050 | 0.028 | 0.863 | 0.115 | 0.651 | 0.913 | 0.682 | 0.735 | ||||
pre-back | 18.05 | 39.00 | 37.01 | 15.26 | 0.968 | 0.998 | 1.042 | 0.030 | 0.060 | 0.051 | 0.027 | 0.869 | 0.132 | 0.650 | 0.916 | 0.697/0.087 | 0.733/0.028 | ||||
pre-front | 17.336 | 43.704 | 33.466 | 14.677 | 0.365 | 0.904 | 2.743 | 0.040 | 0.098 | 0.051 | 0.035 | 0.846 | 0.084 | 0.638 | 0.900 | 0.676/0.090 | 0.731/0.028 | ||||
scratch-both | 20.970 | 40.140 | 38.829 | 17.426 | 0.991 | 0.999 | 1.016 | 0.019 | 0.072 | 0.042 | 0.015 | 0.890 | 0.118 | 0.692 | 0.940 | 0.646 | 0.735 | ||||
scratch-back | 20.401 | 42.434 | 37.626 | 17.222 | 0.990 | 0.999 | 1.016 | 0.018 | 0.060 | 0.043 | 0.015 | 0.894 | 0.141 | 0.695 | 0.941 | 0.660 | 0.735 | ||||
scratch-front | 19.559 | 39.504 | 35.117 | 16.299 | 0.898 | 0.991 | 1.163 | 0.019 | 0.062 | 0.043 | 0.015 | 0.893 | 0.134 | 0.686 | 0.941 | 0.641 | 0.731 | ||||
표 상에선 차이가 마이너 하게 보일 수 있는데, paired t-test하면 유의함. (같은 조건 생성 샘플 비교)