{"id":51,"date":"2025-07-30T19:29:55","date_gmt":"2025-07-30T19:29:55","guid":{"rendered":"https:\/\/cs.tarabitab.com\/ing\/?p=51"},"modified":"2025-07-30T19:34:27","modified_gmt":"2025-07-30T19:34:27","slug":"51","status":"publish","type":"post","link":"https:\/\/cs.tarabitab.com\/ing\/2025\/07\/30\/51\/","title":{"rendered":"Why it makes sense not to solely rely on ChatGPT"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">I gave this&nbsp;<strong>simple reasoning prompt<\/strong>&nbsp;to 30+ AI models. Here&#8217;s what happened&nbsp;<img loading=\"lazy\" decoding=\"async\" height=\"16\" width=\"16\" alt=\"\ud83d\udc47\" src=\"https:\/\/static.xx.fbcdn.net\/images\/emoji.php\/v9\/tee\/2\/16\/1f447.png\"><\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\"><\/p>\n<\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Prompt:<\/strong><br><em>Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have? Let&#8217;s think step by step.<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><img loading=\"lazy\" decoding=\"async\" height=\"16\" width=\"16\" alt=\"\ud83c\udfaf\" src=\"https:\/\/static.xx.fbcdn.net\/images\/emoji.php\/v9\/t4f\/2\/16\/1f3af.png\">&nbsp;<strong>Correct answer: 1<\/strong><br>(Sally is 1 sister. Each brother has&nbsp;<em>2 sisters total<\/em>&nbsp;\u2014 Sally + one other.)<br><br>But when I ran this through&nbsp;<strong>all the top models<\/strong>&nbsp;using Admix.software, the results were wild. Most models failed a basic logic test.<br><br><img loading=\"lazy\" decoding=\"async\" height=\"16\" width=\"16\" alt=\"\u2705\" src=\"https:\/\/static.xx.fbcdn.net\/images\/emoji.php\/v9\/tb4\/2\/16\/2705.png\">&nbsp;Models That Got It&nbsp;<strong>Right<\/strong>&nbsp;(Only 2 of 30+):<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>GPT-4 (OpenAI)<\/li>\n\n\n\n<li>ReMM SLERP L2 (13B)<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">These models reasoned properly and concluded Sally has just&nbsp;<strong>1 sister<\/strong>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>&nbsp;Models That Said&nbsp;6 Sisters&nbsp;(Majority fail group):<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">They misunderstood that &#8220;each brother has 2 sisters&#8221; means 6 total. But it&#8217;s the same 2 sisters shared!<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>GPT-3.5 (all variants) Claude Instant v1 Claude v1, v1.2, v2 Code Llama (7B, 13B, 34B) Falcon (7B, 40B) PaLM 2 Bison (all) MPT-Chat (7B, 30B) Guanaco (33B, 65B) Qwen Chat Platypus 2 Luminous Base \/ Extended \/ Supreme Koala (13B) Command series (light, nightly) Vicuna (all variants) Alpaca 7B Airoboros L2 70B Chronos Hermes 13B RedPajama-INCITE Pythia, MythoMax, Dolly Jurassic 2 (Lite\/Mid)<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><img loading=\"lazy\" decoding=\"async\" height=\"16\" width=\"16\" alt=\"\ud83e\uddc2\" src=\"https:\/\/static.xx.fbcdn.net\/images\/emoji.php\/v9\/tdf\/2\/16\/1f9c2.png\">&nbsp;Some even went as far as saying:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>&#8220;Sally has 24 sisters&#8221;<\/strong>&nbsp;(Jurassic 2 Light)<br><strong>&#8220;Sally has 12 sisters&#8221;<\/strong><br><strong>&#8220;Sally has 0 sisters&#8221;<\/strong>&nbsp;(Guanaco 13B)<br><strong>&#8220;Sally has 9 sisters&#8221;<\/strong>&nbsp;(Luminous Extended Control)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><br><strong>Why This Matters:<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We&#8217;re entering a world where AI helps us write code, interpret data, and make decisions.<br>But if they can\u2019t pass a basic riddle\u2026 are you really trusting the right model?<br><br><br>Use a platform like&nbsp;<strong>Admix.software<\/strong><br>\u2014 it lets you compare&nbsp;<strong>up to SIX AI models side-by-side<\/strong>&nbsp;on&nbsp;<em>any<\/em>&nbsp;prompt.<br>You\u2019ll quickly see which models actually&nbsp;<em>reason<\/em>&#8230; and which just&nbsp;<strong>make things up<\/strong>.<br><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"631\" src=\"https:\/\/cs.tarabitab.com\/ing\/wp-content\/uploads\/2025\/07\/CompareAIResults-1024x631.jpg\" alt=\"\" class=\"wp-image-52\" srcset=\"https:\/\/cs.tarabitab.com\/ing\/wp-content\/uploads\/2025\/07\/CompareAIResults-1024x631.jpg 1024w, https:\/\/cs.tarabitab.com\/ing\/wp-content\/uploads\/2025\/07\/CompareAIResults-300x185.jpg 300w, https:\/\/cs.tarabitab.com\/ing\/wp-content\/uploads\/2025\/07\/CompareAIResults-768x473.jpg 768w, https:\/\/cs.tarabitab.com\/ing\/wp-content\/uploads\/2025\/07\/CompareAIResults-1536x946.jpg 1536w, https:\/\/cs.tarabitab.com\/ing\/wp-content\/uploads\/2025\/07\/CompareAIResults.jpg 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>I gave this&nbsp;simple reasoning prompt&nbsp;to 30+ AI models. Here&#8217;s what happened&nbsp; Prompt:Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have? Let&#8217;s think step by step. &nbsp;Correct answer: 1(Sally is 1 sister. Each brother has&nbsp;2 sisters total&nbsp;\u2014 Sally + one other.) But when I ran this through&nbsp;all the &#8230; <a title=\"Why it makes sense not to solely rely on ChatGPT\" class=\"read-more\" href=\"https:\/\/cs.tarabitab.com\/ing\/2025\/07\/30\/51\/\" aria-label=\"Read more about Why it makes sense not to solely rely on ChatGPT\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-51","post","type-post","status-publish","format-standard","hentry","category-chatgpt"],"_links":{"self":[{"href":"https:\/\/cs.tarabitab.com\/ing\/wp-json\/wp\/v2\/posts\/51","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cs.tarabitab.com\/ing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cs.tarabitab.com\/ing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cs.tarabitab.com\/ing\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cs.tarabitab.com\/ing\/wp-json\/wp\/v2\/comments?post=51"}],"version-history":[{"count":2,"href":"https:\/\/cs.tarabitab.com\/ing\/wp-json\/wp\/v2\/posts\/51\/revisions"}],"predecessor-version":[{"id":58,"href":"https:\/\/cs.tarabitab.com\/ing\/wp-json\/wp\/v2\/posts\/51\/revisions\/58"}],"wp:attachment":[{"href":"https:\/\/cs.tarabitab.com\/ing\/wp-json\/wp\/v2\/media?parent=51"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cs.tarabitab.com\/ing\/wp-json\/wp\/v2\/categories?post=51"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cs.tarabitab.com\/ing\/wp-json\/wp\/v2\/tags?post=51"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}