Command Palette

Search for a command to run...

Direct Preference Optimization: Your Language Model is Secretly a Reward Model | Researchclopedia