SortBench: Benchmarking LLMs based on their ability to sort lists arxiv.org 2 points by wslh a day ago