Openrsync: An implementation of rsync, by the OpenBSD team
448 by sph | 171 comments on Hacker News.
Sunday, May 31, 2026
Saturday, May 30, 2026
Blue Origin's New Glenn blows up during static fire test
Blue Origin's New Glenn blows up during static fire test
488 by enraged_camel | 538 comments on Hacker News.
https://twitter.com/nasaspaceflight/status/20601649284728548... https://ift.tt/egdfY43... https://twitter.com/SawyerMerritt/status/2060174287563116696... https://ift.tt/WU2y6CV... https://ift.tt/6o9qkAt...
488 by enraged_camel | 538 comments on Hacker News.
https://twitter.com/nasaspaceflight/status/20601649284728548... https://ift.tt/egdfY43... https://twitter.com/SawyerMerritt/status/2060174287563116696... https://ift.tt/WU2y6CV... https://ift.tt/6o9qkAt...
Friday, May 29, 2026
Thursday, May 28, 2026
Wednesday, May 27, 2026
Tuesday, May 26, 2026
Monday, May 25, 2026
Sunday, May 24, 2026
SpaceX launches Starship v3 rocket
SpaceX launches Starship v3 rocket
410 by busymom0 | 283 comments on Hacker News.
https://ift.tt/JYrw9dO... [video]
410 by busymom0 | 283 comments on Hacker News.
https://ift.tt/JYrw9dO... [video]
Saturday, May 23, 2026
DeepSeek makes the V4 Pro price discount permanent
DeepSeek makes the V4 Pro price discount permanent
413 by Tiberium | 238 comments on Hacker News.
> (3) The deepseek-v4-pro model API pricing will be officially adjusted to 1/4 of the original price after the 75% discount promotion ends on 2026/05/31 15:59 UTC. https://ift.tt/NXR6lWC
413 by Tiberium | 238 comments on Hacker News.
> (3) The deepseek-v4-pro model API pricing will be officially adjusted to 1/4 of the original price after the 75% discount promotion ends on 2026/05/31 15:59 UTC. https://ift.tt/NXR6lWC
Friday, May 22, 2026
Thursday, May 21, 2026
GitHub confirms breach of 3,800 repos via malicious VSCode extension
GitHub confirms breach of 3,800 repos via malicious VSCode extension
685 by Timofeibu | 239 comments on Hacker News.
Previous thread in sequence: GitHub is investigating unauthorized access to their internal repositories - https://ift.tt/Rr0zYCO - May 2026 (321 comments)
685 by Timofeibu | 239 comments on Hacker News.
Previous thread in sequence: GitHub is investigating unauthorized access to their internal repositories - https://ift.tt/Rr0zYCO - May 2026 (321 comments)
Wednesday, May 20, 2026
GitHub is investigating unauthorized access to their internal repositories
GitHub is investigating unauthorized access to their internal repositories
567 by splenditer | 306 comments on Hacker News.
https://ift.tt/bE8fRuo
567 by splenditer | 306 comments on Hacker News.
https://ift.tt/bE8fRuo
Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks
Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks
567 by zambelli | 203 comments on Hacker News.
Hi HN, I'm Antoine Zambelli, AI Director at Texas Instruments. I built Forge, an open-source reliability layer for self-hosted LLM tool-calling. What it does: - Adds domain-and-tool-agnostic guardrails (retry nudges, step enforcement, error recovery, VRAM-aware context management) to local models running on consumer hardware - Takes an 8B model from ~53% to ~99% on multi-step agentic workflows without changing the model - just the system around it - Ships with an eval harness and interactive dashboard so you can reproduce every number I wanted to run a handful of always-on agentic systems for my portfolio, didn't want to pay cloud frontier costs, and immediately hit the compounding math problem on local models. 90% per-step accuracy sounds great, but with a 5-step workflow that's a 40% failure rate. No existing framework seemed to address this mechanical reliability issue - they all seemed tailor-made for cloud frontier. Demo video: https://youtu.be/MzRgJoJAXGc (side-by-side: same model, same task, with and without Forge guardrails) The paper (accepted to ACM CAIS '26, presenting May 26-29 in San Jose) covers the peer-reviewed findings across 97 model/backend configurations, 18 scenarios, 50 runs each. Key numbers: - Ministral 8B with Forge: 99.3%. Claude Sonnet with Forge: 100%. The gap between a free local 8B model on a $600 GPU and a frontier API is less than 1 point. - The same 8B local model with Forge (99.3%) outperforms Claude Sonnet without guardrails (87.2%) - an 8B model with framework support beats the best result you can get through frontier API alone. - Error recovery scores 0% for every model tested - local and frontier - without the retry mechanism. Not a capability gap, an architectural absence. I'm currently using this for my home assistant running on Ministral 14B-Reasoning, and for my locally hosted agentic coding harness (8B managed to contribute to the codebase!). The guardrail stack has five layers, each independently toggleable. The two that carry the most weight (per ablation study with McNemar's test): retry nudges (24-49 point drops when disabled) and error recovery (~10 point drops, significant for every model tested). Step enforcement is situational - only fires for models with weaker sequencing discipline. Rescue parsing and context compaction showed no significance in the eval but are retained for production workloads where they activate once in a while. One thing I really didn't expect: the serving backend matters. Same Mistral-Nemo 12B weights produce 7% accuracy on llama-server with native function calling and 83% on Llamafile in prompt mode. A 75-point swing from infrastructure alone. I don't think anyone's published this because standard benchmarks don't control for serving backend. Another surprise: there's no distinction in current LLM tool-calling between "the tool ran successfully and returned data" and "the tool ran successfully but found nothing." Both return a value, the orchestrator marks the step complete, and bad data cascades downstream. It's the equivalent of HTTP having 200 but no 404. Forge adds this as a new exception class (ToolResolutionError) - the model sees the error and can retry instead of silently passing garbage forward. Biggest technical challenge was context compaction for memory-constrained hardware. Both Ollama and Llamafile silently fall back to CPU when the model exceeds VRAM - no warning, no error, just 10-100x slower inference. Forge queries nvidia-smi at startup and derives a token budget to prevent this. How to try it: - Clone the repo, run the eval harness on a model I haven't tested. If you get interesting results I'll add them to the dashboard. - Try the proxy server mode - point any OpenAI-compatible client at Forge and it handles guardrails transparently. It's the newest model and I'd love more eyes on it. - Dogfooding led me to optimize model parameters in v0.6.0. The harder eval suite (26 scenarios) is designed to raise the ceiling so no one sits at 100%. Several that did on the original suite can't sweep it - including Opus 4.6. Curious if anyone finds scenarios that expose gaps I haven't thought of. Paper numbers based on pre v0.6.0 code. Background: prior ML publication in unsupervised learning (83 citations). This paper accepted to ACM CAIS '26 - presenting May 26-29. Repo: https://ift.tt/Zn8wGXI Paper: https://ift.tt/WMFldDX... https://ift.tt/cLTM8wl... Dashboard: https://ift.tt/RTPv07L...
567 by zambelli | 203 comments on Hacker News.
Hi HN, I'm Antoine Zambelli, AI Director at Texas Instruments. I built Forge, an open-source reliability layer for self-hosted LLM tool-calling. What it does: - Adds domain-and-tool-agnostic guardrails (retry nudges, step enforcement, error recovery, VRAM-aware context management) to local models running on consumer hardware - Takes an 8B model from ~53% to ~99% on multi-step agentic workflows without changing the model - just the system around it - Ships with an eval harness and interactive dashboard so you can reproduce every number I wanted to run a handful of always-on agentic systems for my portfolio, didn't want to pay cloud frontier costs, and immediately hit the compounding math problem on local models. 90% per-step accuracy sounds great, but with a 5-step workflow that's a 40% failure rate. No existing framework seemed to address this mechanical reliability issue - they all seemed tailor-made for cloud frontier. Demo video: https://youtu.be/MzRgJoJAXGc (side-by-side: same model, same task, with and without Forge guardrails) The paper (accepted to ACM CAIS '26, presenting May 26-29 in San Jose) covers the peer-reviewed findings across 97 model/backend configurations, 18 scenarios, 50 runs each. Key numbers: - Ministral 8B with Forge: 99.3%. Claude Sonnet with Forge: 100%. The gap between a free local 8B model on a $600 GPU and a frontier API is less than 1 point. - The same 8B local model with Forge (99.3%) outperforms Claude Sonnet without guardrails (87.2%) - an 8B model with framework support beats the best result you can get through frontier API alone. - Error recovery scores 0% for every model tested - local and frontier - without the retry mechanism. Not a capability gap, an architectural absence. I'm currently using this for my home assistant running on Ministral 14B-Reasoning, and for my locally hosted agentic coding harness (8B managed to contribute to the codebase!). The guardrail stack has five layers, each independently toggleable. The two that carry the most weight (per ablation study with McNemar's test): retry nudges (24-49 point drops when disabled) and error recovery (~10 point drops, significant for every model tested). Step enforcement is situational - only fires for models with weaker sequencing discipline. Rescue parsing and context compaction showed no significance in the eval but are retained for production workloads where they activate once in a while. One thing I really didn't expect: the serving backend matters. Same Mistral-Nemo 12B weights produce 7% accuracy on llama-server with native function calling and 83% on Llamafile in prompt mode. A 75-point swing from infrastructure alone. I don't think anyone's published this because standard benchmarks don't control for serving backend. Another surprise: there's no distinction in current LLM tool-calling between "the tool ran successfully and returned data" and "the tool ran successfully but found nothing." Both return a value, the orchestrator marks the step complete, and bad data cascades downstream. It's the equivalent of HTTP having 200 but no 404. Forge adds this as a new exception class (ToolResolutionError) - the model sees the error and can retry instead of silently passing garbage forward. Biggest technical challenge was context compaction for memory-constrained hardware. Both Ollama and Llamafile silently fall back to CPU when the model exceeds VRAM - no warning, no error, just 10-100x slower inference. Forge queries nvidia-smi at startup and derives a token budget to prevent this. How to try it: - Clone the repo, run the eval harness on a model I haven't tested. If you get interesting results I'll add them to the dashboard. - Try the proxy server mode - point any OpenAI-compatible client at Forge and it handles guardrails transparently. It's the newest model and I'd love more eyes on it. - Dogfooding led me to optimize model parameters in v0.6.0. The harder eval suite (26 scenarios) is designed to raise the ceiling so no one sits at 100%. Several that did on the original suite can't sweep it - including Opus 4.6. Curious if anyone finds scenarios that expose gaps I haven't thought of. Paper numbers based on pre v0.6.0 code. Background: prior ML publication in unsupervised learning (83 citations). This paper accepted to ACM CAIS '26 - presenting May 26-29. Repo: https://ift.tt/Zn8wGXI Paper: https://ift.tt/WMFldDX... https://ift.tt/cLTM8wl... Dashboard: https://ift.tt/RTPv07L...
Google changes its search box
Google changes its search box
542 by berkeleyjunk | 718 comments on Hacker News.
https://ift.tt/3P2tyOh... , https://ift.tt/8cnNRwh https://ift.tt/sIq31x5... https://ift.tt/4QLS2cZ...
542 by berkeleyjunk | 718 comments on Hacker News.
https://ift.tt/3P2tyOh... , https://ift.tt/8cnNRwh https://ift.tt/sIq31x5... https://ift.tt/4QLS2cZ...
Tuesday, May 19, 2026
Monday, May 18, 2026
Show HN: Semble – Code search for agents that uses 98% fewer tokens than grep
Show HN: Semble – Code search for agents that uses 98% fewer tokens than grep
379 by Bibabomas | 127 comments on Hacker News.
Hey HN! We (Stephan and Thomas) recently open-sourced Semble. We kept running into the same problem while using Claude Code on large codebases: when the agent can't find something directly, it falls back to grep, reading full files or launching subagents. This uses a lot of tokens, and often still misses the relevant code. There are existing tools for this, but they were either too slow to index on demand, needed API keys, or had poor retrieval quality. Semble is our solution for this. It combines static Model2Vec embeddings (using our latest static model: potion-code-16M) with BM25, fused via RRF and reranked with code-aware signals. Everything runs on CPU since there's no transformers involved. On our benchmark of ~1250 query/document pairs across 63 repos and 19 languages, it uses 98% fewer tokens than grep+read and reaches 99% of the retrieval quality of a 137M-parameter code-trained transformer, while being ~200x faster. Main features: - Token-efficient: 98% fewer tokens than grep+read - Fast: ~250ms to index a typical repo on our benchmark, ~1.5ms per query on CPU (very large repos may take longer) - Accurate: 0.854 NDCG@10, 99% of the best transformer setup we tested - MCP server: drop-in for Claude Code, Cursor, Codex, OpenCode - Zero config: no API keys, no GPU, no external services Install in Claude Code with: claude mcp add semble -s user -- uvx --from "semble[mcp]" semble Or check our README for other installation instructions, benchmarks, and methodology: Semble: https://ift.tt/vzhEuYR Benchmarks: https://ift.tt/F7wIGAb Model: https://ift.tt/WHqJoeL Let us know if you have any feedback or questions!
379 by Bibabomas | 127 comments on Hacker News.
Hey HN! We (Stephan and Thomas) recently open-sourced Semble. We kept running into the same problem while using Claude Code on large codebases: when the agent can't find something directly, it falls back to grep, reading full files or launching subagents. This uses a lot of tokens, and often still misses the relevant code. There are existing tools for this, but they were either too slow to index on demand, needed API keys, or had poor retrieval quality. Semble is our solution for this. It combines static Model2Vec embeddings (using our latest static model: potion-code-16M) with BM25, fused via RRF and reranked with code-aware signals. Everything runs on CPU since there's no transformers involved. On our benchmark of ~1250 query/document pairs across 63 repos and 19 languages, it uses 98% fewer tokens than grep+read and reaches 99% of the retrieval quality of a 137M-parameter code-trained transformer, while being ~200x faster. Main features: - Token-efficient: 98% fewer tokens than grep+read - Fast: ~250ms to index a typical repo on our benchmark, ~1.5ms per query on CPU (very large repos may take longer) - Accurate: 0.854 NDCG@10, 99% of the best transformer setup we tested - MCP server: drop-in for Claude Code, Cursor, Codex, OpenCode - Zero config: no API keys, no GPU, no external services Install in Claude Code with: claude mcp add semble -s user -- uvx --from "semble[mcp]" semble Or check our README for other installation instructions, benchmarks, and methodology: Semble: https://ift.tt/vzhEuYR Benchmarks: https://ift.tt/F7wIGAb Model: https://ift.tt/WHqJoeL Let us know if you have any feedback or questions!
Sunday, May 17, 2026
Saturday, May 16, 2026
Friday, May 15, 2026
UK government replaces Palantir software with internally-built refugee system
UK government replaces Palantir software with internally-built refugee system
426 by cdrnsf | 162 comments on Hacker News.
https://shkspr.mobi/blog/2026/05/uk-government-kicks-out-pal...
426 by cdrnsf | 162 comments on Hacker News.
https://shkspr.mobi/blog/2026/05/uk-government-kicks-out-pal...
Thursday, May 14, 2026
Wednesday, May 13, 2026
Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model
Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model
515 by HenryNdubuaku | 156 comments on Hacker News.
Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices. We were always frustrated by the little effort made towards building agentic models that run on budget phones, so we conducted investigations that led to an observation: agentic experiences are built upon tool calling, and massive models are overkill for it. Tool calling is fundamentally retrieval-and-assembly (match query to tool name, extract argument values, emit JSON), not reasoning. Cross-attention is the right primitive for this, and FFN parameters are wasted at this scale. Simple Attention Networks: the entire model is just attention and gating, no MLPs anywhere. Needle is an experimental run for single-shot function calling for consumer devices (phones, watches, glasses...). Training: - Pretrained on 200B tokens across 16 TPU v6e (27 hours) - Post-trained on 2B tokens of synthesized function-calling data (45 minutes) - Dataset synthesized via Gemini with 15 tool categories (timers, messaging, navigation, smart home, etc.) You can test it right now and finetune on your Mac/PC: https://ift.tt/0c5GCwT The full writeup on the architecture is here: https://ift.tt/wbJehyr... We found that the "no FFN" finding generalizes beyond function calling to any task where the model has access to external structured knowledge (RAG, tool use, retrieval-augmented generation). The model doesn't need to memorize facts in FFN weights if the facts are provided in the input. Experimental results to published. While it beats FunctionGemma-270M, Qwen-0.6B, Granite-350M, LFM2.5-350M on single-shot function calling, those models have more scope/capacity and excel in conversational settings. We encourage you to test on your own tools via the playground and finetune accordingly. This is part of our broader work on Cactus ( https://ift.tt/4wu1phQ ), an inference engine built from scratch for mobile, wearables and custom hardware. We wrote about Cactus here previously: https://ift.tt/5SKu9ae Everything is MIT licensed. Weights: https://ift.tt/UNXwlW7 GitHub: https://ift.tt/0c5GCwT
515 by HenryNdubuaku | 156 comments on Hacker News.
Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices. We were always frustrated by the little effort made towards building agentic models that run on budget phones, so we conducted investigations that led to an observation: agentic experiences are built upon tool calling, and massive models are overkill for it. Tool calling is fundamentally retrieval-and-assembly (match query to tool name, extract argument values, emit JSON), not reasoning. Cross-attention is the right primitive for this, and FFN parameters are wasted at this scale. Simple Attention Networks: the entire model is just attention and gating, no MLPs anywhere. Needle is an experimental run for single-shot function calling for consumer devices (phones, watches, glasses...). Training: - Pretrained on 200B tokens across 16 TPU v6e (27 hours) - Post-trained on 2B tokens of synthesized function-calling data (45 minutes) - Dataset synthesized via Gemini with 15 tool categories (timers, messaging, navigation, smart home, etc.) You can test it right now and finetune on your Mac/PC: https://ift.tt/0c5GCwT The full writeup on the architecture is here: https://ift.tt/wbJehyr... We found that the "no FFN" finding generalizes beyond function calling to any task where the model has access to external structured knowledge (RAG, tool use, retrieval-augmented generation). The model doesn't need to memorize facts in FFN weights if the facts are provided in the input. Experimental results to published. While it beats FunctionGemma-270M, Qwen-0.6B, Granite-350M, LFM2.5-350M on single-shot function calling, those models have more scope/capacity and excel in conversational settings. We encourage you to test on your own tools via the playground and finetune accordingly. This is part of our broader work on Cactus ( https://ift.tt/4wu1phQ ), an inference engine built from scratch for mobile, wearables and custom hardware. We wrote about Cactus here previously: https://ift.tt/5SKu9ae Everything is MIT licensed. Weights: https://ift.tt/UNXwlW7 GitHub: https://ift.tt/0c5GCwT
Tuesday, May 12, 2026
Monday, May 11, 2026
Show HN: Building a web server in assembly to give my life (a lack of) meaning
Show HN: Building a web server in assembly to give my life (a lack of) meaning
400 by imtomt | 214 comments on Hacker News.
This is ymawky, a static file web server for MacOS written entirely in ARM64 assembly. It supports GET, PUT, DELETE, HEAD, and OPTIONS requests, and supports Range: bytes=X-Y headers (which allows scrubbing for video streaming). It decodes percent-encoded URLs, strictly enforces docroot, serves custom error pages for any HTTP error response, supports directory listing, and has (some) mitigations against slowloris-like attacks. I’ve also written a more detailed writeup here: https://imtomt.github.io/ymawky/
400 by imtomt | 214 comments on Hacker News.
This is ymawky, a static file web server for MacOS written entirely in ARM64 assembly. It supports GET, PUT, DELETE, HEAD, and OPTIONS requests, and supports Range: bytes=X-Y headers (which allows scrubbing for video streaming). It decodes percent-encoded URLs, strictly enforces docroot, serves custom error pages for any HTTP error response, supports directory listing, and has (some) mitigations against slowloris-like attacks. I’ve also written a more detailed writeup here: https://imtomt.github.io/ymawky/
Sunday, May 10, 2026
Saturday, May 9, 2026
Friday, May 8, 2026
Cloudflare to cut about 20% workforce
Cloudflare to cut about 20% workforce
563 by PriorityLeft | 333 comments on Hacker News.
https://ift.tt/3pkfj9a
563 by PriorityLeft | 333 comments on Hacker News.
https://ift.tt/3pkfj9a
Thursday, May 7, 2026
Zuckerberg 'Personally Authorized and Encouraged' Meta's Copyright Infringement
Zuckerberg 'Personally Authorized and Encouraged' Meta's Copyright Infringement
481 by spankibalt | 433 comments on Hacker News.
https://ift.tt/LUybz6k...
481 by spankibalt | 433 comments on Hacker News.
https://ift.tt/LUybz6k...
Wednesday, May 6, 2026
Tuesday, May 5, 2026
Monday, May 4, 2026
Sunday, May 3, 2026
Saturday, May 2, 2026
Show HN: WhatCable, a tiny menu bar app for inspecting USB-C cables
Show HN: WhatCable, a tiny menu bar app for inspecting USB-C cables
458 by sleepingNomad | 133 comments on Hacker News.
USB-C cables can be a mess. One cable charges at 5W, another does 100W and Thunderbolt 4, and they look identical in the drawer. WhatCable sits in your menu bar and reads the cable data your Mac already has access to. Plug in a cable and it tells you in plain English what it can actually do: charging wattage, data speed, display support, Thunderbolt, etc. Built in Swift/SwiftUI. Open source, free, no tracking. GitHub: https://ift.tt/RL5nTAX
458 by sleepingNomad | 133 comments on Hacker News.
USB-C cables can be a mess. One cable charges at 5W, another does 100W and Thunderbolt 4, and they look identical in the drawer. WhatCable sits in your menu bar and reads the cable data your Mac already has access to. Plug in a cable and it tells you in plain English what it can actually do: charging wattage, data speed, display support, Thunderbolt, etc. Built in Swift/SwiftUI. Open source, free, no tracking. GitHub: https://ift.tt/RL5nTAX
Friday, May 1, 2026
For Linux kernel vulnerabilities, there is no heads-up to distributions
For Linux kernel vulnerabilities, there is no heads-up to distributions
574 by ori_b | 497 comments on Hacker News.
Recent: Copy Fail - https://ift.tt/qm3kwjC - April 2026 (466 comments)
574 by ori_b | 497 comments on Hacker News.
Recent: Copy Fail - https://ift.tt/qm3kwjC - April 2026 (466 comments)
Subscribe to:
Posts (Atom)