!!!! 한 곡 내에서, songform 끼리의 sts 비교 square matrixra

Test 시에 바꿀 것

1. Generation

Metrics PPL syl-distance syl-acc sent-emb dist. (with input text)
Models Full Phrase line ngrams+word Full Phrase line ngrams+word Full Phrase line ngrams+word Full Phrase line ngrams+word
Baselines Chat GPT x x x x
LLaMa x x x x
Claude 3 x x x x
Proposed pre-both
scratch-both

Probability of Success within 10 Attempts

sent_emb = summary Metrics PPL prob. of success (in 1st attempt) prob. of success (in 10th attempt) avg. regen num (in success case) syl-distance syl-acc sent-emb dist. bertscore
Models Full Phrase line ngrams+word Full Phrase line ngrams+word Full Phrase line ngrams+word sum
vs lyr
pre-both 18.273 44.719 42.242 15.490 0.975 0.998 1.040 0.031 0.067 0.050 0.028 0.863 0.115 0.651 0.913 0.682 0.735
pre-back 18.05 39.00 37.01 15.26 0.968 0.998 1.042 0.030 0.060 0.051 0.027 0.869 0.132 0.650 0.916 0.697/0.087 0.733/0.028
pre-front 17.336 43.704 33.466 14.677 0.365 0.904 2.743 0.040 0.098 0.051 0.035 0.846 0.084 0.638 0.900 0.676/0.090 0.731/0.028
scratch-both 20.970 40.140 38.829 17.426 0.991 0.999 1.016 0.019 0.072 0.042 0.015 0.890 0.118 0.692 0.940 0.646 0.735
scratch-back 20.401 42.434 37.626 17.222 0.990 0.999 1.016 0.018 0.060 0.043 0.015 0.894 0.141 0.695 0.941 0.660 0.735
scratch-front 19.559 39.504 35.117 16.299 0.898 0.991 1.163 0.019 0.062 0.043 0.015 0.893 0.134 0.686 0.941 0.641 0.731

표 상에선 차이가 마이너 하게 보일 수 있는데, paired t-test하면 유의함. (같은 조건 생성 샘플 비교)