- Jonathan Gratch - OpenReview
ACII 2021 CaSiNo: A Corpus of Campsite Negotiation Dialogues for Automatic Negotiation Systems Kushal Chawla, Jaysa Ramirez, Rene Clever, Gale M Lucas, Jonathan May, Jonathan Gratch 2021 (modified: 04 Jan 2022) NAACL-HLT 2021 Towards Emotion-Aware Agents For Negotiation Dialogues Kushal Chawla, Rene Clever, Jaysa Ramirez, Gale M Lucas
- Greg Durrett - OpenReview
CLEVER: A Curated Benchmark for Formally Verified Code Generation Amitayush Thakur, Jasper Lee, George Tsoukalas, Meghana Sistla, Matthew Zhao, Stefan Zetzsche, Greg Durrett, Yisong Yue, Swarat Chaudhuri NeurIPS 2025 Datasets and Benchmarks Track poster
- Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning. . .
Upon mitigating the Clever Hans effect, our task requires the LLM to not only achieve the correct answer on its own, but also be able to hold and defend its belief instead of blindly believing or getting misled by the user's (invalid) arguments and critiques, thus testing in greater depth whether the LLM grasps the essence of the reasoning
- EVALUATING THE ROBUSTNESS OF NEURAL NET : A E VALUE THEORY APPROACH
4 THE CLEVER ROBUSTNESS METRIC VIA EXTREME VALUE THEORY tack-agnostic score 2 proof deferred to Appendix B 3 proof deferred to Appendix C t of a classifier and Lj q;x0 is defined as maxx2Bp(x0;R) krg(x)kq Although rg(x) can be calculated easily via back propagation, computing Lj q;x0 is more involved be
- Counterfactual Debiasing for Fact Verification
579 In this paper, we have proposed a novel counter- factual framework CLEVER for debiasing fact- checking models Unlike existing works, CLEVER is augmentation-free and mitigates biases on infer- ence stage In CLEVER, the claim-evidence fusion model and the claim-only model are independently trained to capture the corresponding information
- A Protocol-Driven Platform for Agent-Agnostic Evaluation of LLM Agents
Hook it up with TaskConfig—our handy layer for crafting clever input templates and grabbing outputs steadily via JMESPath—and switching agents turns effortless, no extra fiddling needed Our benchmark structure ensures reproducibility by locking in versions
- Anchor Frame Bridging for Coherent First-Last Frame Video Generation
First-last frame video generation has recently gained significant attention It enables coherent motion generation between specified first and last frames However, this approach suffers from
- The Pitfalls of Next-Token Prediction - OpenReview
This verifies our hypothesis that the Clever Hans cheat absorbs away supervision that is critical to learn the first token At the end of this section, we provide more intuition for how the absence of Clever Hans cheat, allows the teacherless models to solve this task that language has enough redundancy to be conducive for next-token prediction
|