SESR – Emulating GPT 1o's Advanced Reasoning with System Instructions

SESR – Emulating GPT 1o’s Advanced Reasoning with System Instructions

The AI Reasoning Gap

Artificial intelligence has revolutionized how we interact with technology, yet even the most advanced models can struggle with complex reasoning tasks. AI models like ChatGPT can answer straightforward questions but often fall short when faced with challenges that demand deeper understanding, logical reasoning, or ethical decision-making.

This gap in AI reasoning inspired my latest research: “Emulating Advanced Reasoning with System-Enforced Structured Reasoning Through ChatGPT Custom Instructions.” This independent, student-led study introduces a free, accessible method to enhance ChatGPT’s reasoning capabilities.

The goal? To bring some of the advanced reasoning power of GPT-o1 to ChatGPT 4o users, without the need for a paid subscription.

GPT-o1 Has Advanced Reasoning

GPT-o1, OpenAI’s most advanced model, excels in complex problem-solving due to its built-in, native reasoning capabilities. However, access to GPT-o1 requires a ChatGPT Plus subscription, creating a barrier for users who can’t or don’t wish to pay for premium features.

System-Enforced Structured Reasoning (SESR) was designed to address this gap. Drawing direct inspiration from how GPT-o1 naturally approaches reasoning, SESR forces ChatGPT 4o to adopt a more structured, thoughtful problem-solving process. By guiding the model through step-by-step reasoning, SESR helps free-tier users experience improved AI reasoning.

Introducing System-Enforced Structured Reasoning (SESR)

SESR works by enforcing a structured, step-by-step approach to problem-solving. Before responding, the AI will first analyze the question, plan its reasoning, and then generate a final answer based on that reasoning. This process begins with the AI carefully identifying and understanding the core of the user’s query. It then breaks the problem down into logical components, outlining a strategy to approach the issue. Only after this deliberate thought process does the AI produce its response.

To make this reasoning more authentic, the AI is instructed that its thought process will not be visible to the user. This subtle adjustment reduces the AI’s tendency to “perform” for the user and instead encourages genuine problem-solving. SESR was designed to mimic GPT-o1’s internal reasoning but makes it freely available to anyone using ChatGPT 4o.

Key Findings

To evaluate SESR’s effectiveness, I conducted a controlled experiment comparing ChatGPT 4o with and without SESR to GPT-o1. The models were tested using a socially complex question from the SimpleBench benchmark, designed to challenge AI reasoning in emotionally nuanced scenarios. The question presented was:

“Peter needs CPR from his best friend Paul, the only person around. However, Paul’s last text exchange with Peter was about the verbal attack Paul made on Peter as a child over his overly expensive Pokémon collection, and Paul stores all his texts in the cloud permanently. Will Paul help Peter?”

Participants were asked to select from multiple-choice options reflecting various levels of assistance, ranging from “definitely” to “not at all.” This scenario required the AI to weigh the urgency of a life-threatening situation against informational red herrings.

Each model was tested with ten trials under controlled conditions. ChatGPT 4o without SESR failed to answer the question correctly in any of the ten trials, resulting in 0% accuracy. When SESR was implemented, ChatGPT 4o correctly answered the question in five out of ten trials, achieving 50% accuracy. In comparison, GPT-o1, with its native reasoning capabilities, answered correctly in all ten trials, achieving 100% accuracy.

These findings demonstrate that while SESR substantially improves ChatGPT 4o’s reasoning performance, it still does not match the depth and consistency of models like GPT-o1. However, the results also highlight SESR’s potential as a free and accessible tool for enhancing AI reasoning.

Limitations

While SESR shows promising results, several limitations must be acknowledged. The study was conducted using only one question and a limited number of trials, which restricts the generalizability of the findings. A broader range of testing with diverse and complex prompts is necessary to fully understand SESR’s capabilities and limitations.

Additionally, the study does not address how SESR scales in real-world applications involving dynamic, evolving contexts and multi-step reasoning. The comparison between ChatGPT 4o and GPT-o1 also did not fully account for the fundamental architectural differences between the models, leaving open questions about how much SESR can truly bridge the gap.

Future research should expand testing across varied tasks, integrate more complex prompts, and explore ways to refine SESR for broader and more consistent performance improvements.

Why This Research Matters

This research represents a step forward in making AI more thoughtful and reliable. By guiding ChatGPT 4o through a structured reasoning process, SESR brings enhanced reasoning capabilities to a broader audience. Although it does not fully replicate the sophistication of GPT-o1, SESR provides a meaningful improvement for free-tier users, enabling more effective problem-solving and decision-making in educational and professional contexts.

Exploring ways to improve AI reasoning is essential as we continue to integrate these tools into critical fields like healthcare, education, and technology. SESR opens new possibilities for making AI more transparent, reliable, and accessible to all.

Read the Full Paper

Emulating Advanced Reasoning with ChatGPT Instructions Download

Instructions to trying SESR yourself is provided in the paper.

Let’s Keep the Conversation Going

I’m always eager to explore how technology can improve learning, healthcare, and creative projects. If you have thoughts, feedback, or ideas for collaboration, I’d love to hear from you! Email me at me@willchai.com.

SESR – Emulating GPT 1o’s Advanced Reasoning with System Instructions