Saturday, May 25, 2024

Tom Waits vs. Frito-Lay, Inc (2003)

Tom Waits vs. Frito-Lay, Inc (2003)
378 by Borrible | 238 comments on Hacker News.


Show HN: We open sourced our entire text-to-SQL product

Show HN: We open sourced our entire text-to-SQL product
415 by aazo11 | 136 comments on Hacker News.
Long story short: We (Dataherald) just open-sourced our entire codebase, including the core engine, the clients that interact with it and the backend application layer for authentication and RBAC. You can now use the full solution to build text-to-SQL into your product. The Problem: modern LLMs write syntactically correct SQL, but they struggle with real-world relational data. This is because real world data and schema is messy, natural language can often be ambiguous and LLMs are not trained on your specific dataset. Solution: The core NL-to-SQL engine in Dataherald is an LLM based agent which uses Chain of Thought (CoT) reasoning and a number of different tools to generate high accuracy SQL from a given user prompt. The engine achieves this by: - Collecting context at configuration from the database and sources such as data dictionaries and unstructured documents which are stored in a data store or a vector DB and injected if relevant - Allowing users to upload sample NL <> SQL pairs (golden SQL) which can be used in few shot prompting or to fine-tune an NL-to-SQL LLM for that specific dataset - Executing the SQL against the DB to get a few sample rows and recover from errors - Using an evaluator to assign a confidence score to the generated SQL The repo includes four services https://ift.tt/3WDzmnG : 1- Engine: The core service which includes the LLM agent, vector stores and DB connectors. 2- Admin Console: a NextJS front-end for configuring the engine and observability. 3- Enterprise Backend: Wraps the core engine, adding authentication, caching, and APIs for the frontend. 4- Slackbot: Integrate Dataherald directly into your Slack workflow for on-the-fly data exploration. Would love to hear from the community on building natural language interfaces to relational data. Anyone live in production without a human in the loop? Thoughts on how to improve performance without spending weeks on model training?

Financial Statement Analysis with Large Language Models

Financial Statement Analysis with Large Language Models
423 by mellosouls | 173 comments on Hacker News.


WinDirStat – Windows Directory Statistics

WinDirStat – Windows Directory Statistics
408 by whereistimbo | 189 comments on Hacker News.


Tuesday, May 21, 2024

Ask HN: Video streaming is expensive yet YouTube "seems" to do it for free. How?

Ask HN: Video streaming is expensive yet YouTube "seems" to do it for free. How?
399 by pinakinathc | 356 comments on Hacker News.
Can anyone help me understand the economics of video streaming platforms? Streaming, encoding, and storage demands enormous costs -- especially at scale (e.g., on average each 4k video with close to 1 million views). Yet YouTube seems to charge no money for it. I know advertisements are a thing for YT, but is it enough? If tomorrow I want to start a platform that is supported with Advert revenues, I know I will likely fail. However, maybe at YT scale (or more specifically Google Advert scale) the economics works? ps: I would like this discussion to focus on the absolute necessary elements (e.g., storing, encoding, streaming) and not on other factors contributing to latency/cost like running view count algorithms.

ICC prosecutor seeks arrest warrants against Sinwar and Netanyahu for war crimes

ICC prosecutor seeks arrest warrants against Sinwar and Netanyahu for war crimes
607 by spzx | 1030 comments on Hacker News.


Saturday, May 11, 2024

Show HN: A web debugger an ex-Cloudflare team has been working on for 4 years

Show HN: A web debugger an ex-Cloudflare team has been working on for 4 years
746 by thedg | 182 comments on Hacker News.
Hey HN, I wanted to show you a product a small team and I have been working on for 4 years. https://jam.dev It’s called Jam and it prevents product managers (like I used to be) from being able to create vague and un-reproducible bug tickets (like I used to create). It’s actually really hard as a non-engineer to file useful bug tickets for engineers. Like, sometimes I thought I included a screenshot, but the important information the engineer needed was what was actually right outside the boundary of the screenshot I took. Or I'd write that something "didn't work" but the engineer wasn't sure if I meant that it returned an error or if it was unresponsive. So the engineer would be frustrated, I would be frustrated, and fixing stuff would slow to a halt while we went back and forth to clarify how to repro the issue over async Jira comments. It’s actually pretty crazy that while so much has changed in how we develop software (heck, we have types in javascript now*), the way we capture and report bugs is just as manual and lossy as it was in the 1990’s. We can run assembly in the browser but there’s still no tooling to help a non-engineer show a bug to an engineer productively. So that’s what Jam is. Dev tools + video in a link. It’s like a shareable HAR file synced to a video recording of the session. And besides video, you can use it to share an instant replay of a bug that just happened — basically a 30 second playback of the DOM as a video. We’ve spent a lot of time adding in a ton of niceties, like Jam writes automatic repro steps for you, and Jam’s dev tools use the same keyboard shortcuts you’re used to in Chrome dev tools, and our team’s personal favorite: Jam parses GraphQL responses and pulls out mutation names and errors (which is important because GraphQL uses one endpoint for all requests and always returns a 200, meaning you usually have to sift through every GraphQL request when debugging to find the one you’re looking for) We’re now 2 years in to the product being live and people have used Jam to fix more than 2 million bugs - which makes me so happy - but there’s still a ton to do. I wanted to open up for discussion here and get your feedback and opinions how can we make it even more valuable for you debugging? The worst part of the engineering job is debugging and not even being able to repro the issue, it’s not even really engineering, it’s just a communication gap, one that we should be able to solve with tools. So yeah excited to get your feedback and hear your thoughts how we can make debugging just a little less frustrating. (Jam is free to use forever — there is a paid tier for features real companies would need, but we’re keeping a large free plan forever. We learned to build products at Cloudflare and free tier is in our ethos, both my co-founder and I and about half the team is ex-Cloudflare) and what we loved there is how much great feedback we’d get because the product was mostly free to use. We definitely want to keep that going at Jam.) By the way, we’re hiring engineers and if this is a problem that excites you, we’d love to chat: jam.dev/careers

Most of Europe is glowing pink under the aurora

Most of Europe is glowing pink under the aurora
764 by luispa | 194 comments on Hacker News.


Tuesday, May 7, 2024

Show HN: Dillo 3.1.0 released after 9 years

Show HN: Dillo 3.1.0 released after 9 years
421 by rodarima | 104 comments on Hacker News.
As commented before[1], I've been working on the past months to get the Dillo back to life and today I'm happy to release the 3.1.0 version, after almost 9 years since the last one. [1]: https://ift.tt/mBKZb0h During this time: - A new mailing list was created[2] which is beginning to get some messages and patches. It is available in gmane via NNTP at gmane.comp.web.dillo.devel. [2]: https://ift.tt/nhrdxcD... - A LiberaPay page[3] which received the first donations (thanks!). [3]: https://ift.tt/5afzwLS - Some more bugs where fixed and new features where added (details in the release page and/or changelog). Thanks to all the people that contributed with patches and tests. Now let's see if we can make it land in some distros!