FBI raids home of prominent computer scientist who has gone incommunicado
827 by JaimeThompson | 334 comments on Hacker News.
Monday, March 31, 2025
Sunday, March 30, 2025
Saturday, March 29, 2025
Friday, March 28, 2025
Wednesday, March 26, 2025
Tuesday, March 25, 2025
Monday, March 24, 2025
Sunday, March 23, 2025
Saturday, March 22, 2025
Tuesday, March 18, 2025
Monday, March 17, 2025
Sunday, March 16, 2025
Thursday, March 13, 2025
Wednesday, March 12, 2025
Show HN: Factorio Learning Environment – Agents Build Factories
Show HN: Factorio Learning Environment – Agents Build Factories
707 by noddybear | 204 comments on Hacker News.
I'm Jack, and I'm excited to share a project that has channeled my Factorio addiction recently: the Factorio Learning Environment (FLE). FLE is an open-source framework for developing and evaluating LLM agents in Factorio. It provides a controlled environment where AI models can attempt complex automation, resource management, and optimisation tasks in a grounded world with meaningful constraints. A critical advantage of Factorio as a benchmark is its unbounded nature. Unlike many evals that are quickly saturated by newer models, Factorio's geometric complexity scaling means it won't be "solved" in the next 6 months (or possibly even years). This allows us to meaningfully compare models by the order-of-magnitude of resources they can produce - creating a benchmark with longevity. The project began 18 months ago after years of playing Factorio, recognising its potential as an AI research testbed. A few months ago, our team (myself, Akbir, and Mart) came together to create a benchmark that tests agent capabilities in spatial reasoning and long-term planning. Two technical innovations drove this project forward: First, we discovered that piping Lua into the Factorio console over TCP enables running (almost) arbitrary code without directly modding the game. Second, we developed a first-class Python API that wraps these Lua programs to provide a clean, type-hinted interface for AI agents to interact with Factorio through familiar programming paradigms. Agents interact with FLE through a REPL pattern: 1. They observe the world (seeing the output of their last action) 2. Generate Python code to perform their next action 3. Receive detailed feedback (including exceptions and stdout) We provide two main evaluation settings: - Lab-play: 24 structured tasks with fixed resources - Open-play: An unbounded task of building the largest possible factory on a procedurally generated map We found that while LLMs show promising short-horizon skills, they struggle with spatial reasoning in constrained environments. They can discover basic automation strategies (like electric-powered drilling) but fail to achieve more complex automation (like electronic circuit manufacturing). Claude Sonnet 3.5 is currently the best model (by a significant margin). The code is available at https://ift.tt/U8JDsRe . You'll need: - Factorio (version 1.1.110) - Docker - Python 3.10+ The README contains detailed installation instructions and examples of how to run evaluations with different LLM agents. We would love to hear your thoughts and see what others can do with this framework!
707 by noddybear | 204 comments on Hacker News.
I'm Jack, and I'm excited to share a project that has channeled my Factorio addiction recently: the Factorio Learning Environment (FLE). FLE is an open-source framework for developing and evaluating LLM agents in Factorio. It provides a controlled environment where AI models can attempt complex automation, resource management, and optimisation tasks in a grounded world with meaningful constraints. A critical advantage of Factorio as a benchmark is its unbounded nature. Unlike many evals that are quickly saturated by newer models, Factorio's geometric complexity scaling means it won't be "solved" in the next 6 months (or possibly even years). This allows us to meaningfully compare models by the order-of-magnitude of resources they can produce - creating a benchmark with longevity. The project began 18 months ago after years of playing Factorio, recognising its potential as an AI research testbed. A few months ago, our team (myself, Akbir, and Mart) came together to create a benchmark that tests agent capabilities in spatial reasoning and long-term planning. Two technical innovations drove this project forward: First, we discovered that piping Lua into the Factorio console over TCP enables running (almost) arbitrary code without directly modding the game. Second, we developed a first-class Python API that wraps these Lua programs to provide a clean, type-hinted interface for AI agents to interact with Factorio through familiar programming paradigms. Agents interact with FLE through a REPL pattern: 1. They observe the world (seeing the output of their last action) 2. Generate Python code to perform their next action 3. Receive detailed feedback (including exceptions and stdout) We provide two main evaluation settings: - Lab-play: 24 structured tasks with fixed resources - Open-play: An unbounded task of building the largest possible factory on a procedurally generated map We found that while LLMs show promising short-horizon skills, they struggle with spatial reasoning in constrained environments. They can discover basic automation strategies (like electric-powered drilling) but fail to achieve more complex automation (like electronic circuit manufacturing). Claude Sonnet 3.5 is currently the best model (by a significant margin). The code is available at https://ift.tt/U8JDsRe . You'll need: - Factorio (version 1.1.110) - Docker - Python 3.10+ The README contains detailed installation instructions and examples of how to run evaluations with different LLM agents. We would love to hear your thoughts and see what others can do with this framework!
Tuesday, March 11, 2025
Monday, March 10, 2025
Sunday, March 9, 2025
Show HN: Bayleaf – Building a low-profile wireless split keyboard
Show HN: Bayleaf – Building a low-profile wireless split keyboard
725 by sgraz | 245 comments on Hacker News.
Hey HN, I built a wireless, split, ultra-low profile keyboard from scratch called Bayleaf. As a beginner I learned all things electronics, PCB-building, designing for manufacturing, and many other hardware-related skills to put this together. This case study dives into the build process and of course the final result, hope you enjoy!
725 by sgraz | 245 comments on Hacker News.
Hey HN, I built a wireless, split, ultra-low profile keyboard from scratch called Bayleaf. As a beginner I learned all things electronics, PCB-building, designing for manufacturing, and many other hardware-related skills to put this together. This case study dives into the build process and of course the final result, hope you enjoy!
Thursday, March 6, 2025
Wednesday, March 5, 2025
Monday, March 3, 2025
Subscribe to:
Posts (Atom)