More

jerednel · 2025-12-09T01:00:16 1765242016

These save me about $10-$15 on tips regularly. Thankful for Coco.

jerednel · 2025-11-01T01:39:43 1761961183

Just ban advertising and let me gamble.

jerednel · 2025-11-01T01:37:13 1761961033

It’s not that hard to look at lines at Pinnacle and Circa and make estimates about the fair value of a wager. Open accounts at every book and line shop and maximize expected value.

Also you are ignoring platforms like Novig which are like the polymarket for sports betting.

csomar · 2025-11-01T07:26:47 1761982007

There is no comparison. A quick search about Pinnacle: https://www.reddit.com/r/sportsbook/comments/kek5t8/warning_...

My experience (though I have never bet on these platforms) is that Pinnacle-like platforms almost never let you withdraw your "earnings". They are essentially a bookie.

Polymarket on the other hand, is just an exchange. And they use Defi to make sure you can always withdraw your bounty even if you get "front-end" banned from their platform.

So to affirm the previous poster: These companies are not in the same business.

jerednel · 2025-11-01T19:01:19 1762023679

They absolutely let you withdraw earnings. I place around 20 wagers a day. Pinnacle and Circa are used as measures of what the market is actually pricing in. You can devig lines to arrive at event likelihoods. From there you can line shop such that your expected value is positive over the long run.

omcnoe · 2025-11-01T03:08:21 1761966501

Betting platforms specifically work to identify customers who act in such ways and ban them from the platform. Developing accurate odds costs money, it's cheaper to just identify "advantage bettors" and ban them.

noitpmeder · 2025-11-01T05:06:08 1761973568

I'm not sure this applies to these prediction markets. Normally when gambling you're at a casino playing e.g. blackjack, where if you're winning more often than expected you're taking the house's money.

But this is more like playing poker, where overall the casino could care less if you're continuously crushing the other players, as long as people keep turning up to play and they keep getting a rake.

jerednel · on Nov 4, 2024

Cool! Does this assume the unstructured data already has a corresponding metadata file?

My most common use cases involve getting PDFs or HTML files and I have to parse the metadata to store along with the embedding.

Would I have to run a process to extract file metadata into JSONs for every embedding/chunk? Would keys created based off document be title+chunk_no?

Very interested in this because documents from clients are subject to random changes and I don’t have very robust systems in place.

dmpetrov · on Nov 4, 2024

DataChain has no assumptions about metadata format. However, some formats are supported out of the box: WebDataset, json-pair, openimage, etc.

Extract metadata as usual, then return the result as JSON or a Pydantic object. DataChian will automatically serialize it to internal dataset structure (SQLite), which can be exported to CSV/Parquet.

In case of PDF/HTML, you will likely produce multiple documents per file which is also supported - just `yield return my_result` multiple times from map().

Check out video: https://www.youtube.com/watch?v=yjzcPCSYKEo Blog post: https://datachain.ai/blog/datachain-unstructured-pdf-process...

nbbaier · on Nov 4, 2024

> However, some formats are supported out of the box: WebDataset, json-pair, openimage, etc.

Forgive my ignorance, but what is "json-pair"?

dmpetrov · on Nov 4, 2024

It's not a format :)

It's simpliy about linking metadata from a json to a corresponding image or video file, like pairing data003.png & data003.json to a single, virtual record. Some format use this approach: open-image or laion datasets.

nbbaier · on Nov 5, 2024

Thanks for the explanation!

spott · on Nov 4, 2024

> DataChain has no assumptions about metadata format.

Could your metadata come from something like a Postgres sql statement? Or an iceberg view?

dmpetrov · on Nov 4, 2024

Absolutely, that's a common scenario!

Just connect from your Python code (like the lambda in the example) to DB and extract the necessary data.

Kiro · on Nov 4, 2024

What relevant metadata is there in an HTML file?

dmpetrov · on Nov 4, 2024

I guess, it involves splitting a file into smaller document snippets, getting page numbers and such, and calculating embeddings for each snippet—that’s the usual approach. Specific signals vary by use case.

Hopefully, @jerednel can add more details.

jerednel · on Nov 4, 2024

For HTML it's markup tags...h1's, page title, meta keywords, meta descriptions.

My retriever functions will typically use metadata in combination with the similarity search to do impart some sort of influence or for reranking.

jerednel · on Oct 19, 2024

It's not super clear to me how this interacts with data. If I have am using ADLS to store delta tables, and I cannot pull prod to my local can I still use this? Is there a point if I can just look at delta log to switch between past versions?

riedel · on Oct 19, 2024

DVC is (at least as I use it) pretty much just git LFS with multiple backends (guess actually a more simple git annex). It further has some rather MLOps specific stuff. Is handy if you do versions model training with changing data on S3.

haensi · on Oct 19, 2024

There’s another thread from October 2022 on that topic.

https://news.ycombinator.com/item?id=33047634

What makes DVC especially useful for MLOps? Aren’t MLFlow or W&B solving that in a way that’s open source (the former) or just increases the speed and scale massively ( the latter)?

Disclaimer: I work at W&B.

riedel · on Oct 19, 2024

DVC is much more basic (feels more unix style), integrates really well with any simple CI/CD scripting with git versioning without the need to set up any additional servers.

And it is not either or. People actually combine MLFlow and SVC [0]

[0] https://data-ai.theodo.com/blog-technique/dvc-pipeline-runs-...

matrss · on Oct 19, 2024

Speaking of git-annex, there is another project called DataLad (https://www.datalad.org/), which has some overlap with DVC. It uses git-annex under the hood and is domain-agnostic, compared to the ML focus that DVC has.

starkparker · on Oct 19, 2024

I've used it for storing rasters alongside georeferencing data in small GIS projects, as an alternative to git LFS. It not only works like git but can integrate with git repos through commit and push/pull hooks, storing DVC pointers and managing .gitignore files while retaining directory structure of the DVC-managed files. It's neat, even if the initial learning curve was a little steep.

We used Google Drive as a storage backend and had to grow out of it to a WebDAV backend, and it was nearly trivial to swap them out and migrate.

jerednel · on Jan 18, 2024

Same. I just bought a t480 off eBay a week ago while I already have a m2 pro MBP with 32gb RAM. I don’t know what compelled me but it’s just so cozy. By all measures the Mac is superior. Thinkpads are like the vinyl records of laptops.

wkjagt · on Jan 18, 2024

I like that. I feel similar. I picked it up locally, and while I was driving over, I thought to myself: why am I even doing this, I have a perfect MacBook. But I'm happy I did it. It feels better in a way. Somehow like vinyl maybe, but not as fragile ;-) And my excuse is: can't run Arch on an M1 MacBook.

jerednel · on Nov 1, 2023

The temptation is strong to drop 3.75k on bl.ing

jerednel · on May 23, 2023

Existence is a funny term in Buddhism. The end goal of development stage practice is the recognition that the deity is inseparable from one’s self because the nature of any given deity is equal to our own in an absolute sense. After all, everything is empty of inherent existence and are all self arisen manifestations from a primordial ground of being, hence no you and no I as separate.

So some people do see them as external things to ask things of but others see them as mind created and used as supports for meditation to realize our own true nature. Depends on lineage and teacher.

jerednel · on April 16, 2023

Science consists of predictions based on measurements of our perception as mediated by our senses or tools that augment them. Dualism (or idealism for that matter) doesn't require religious beliefs. You can arrive at them analytically. Science is great for explaining the conventional reality of things as they appear though. But until I read a convincing explanation for consciousness and subjective experience that doesn't reduce to "neural correlates" I can't get on board with physicalism.

The fact that we're not all unconscious automatons that just evolved from one long cause-effect chain stemming from the big bang to now continues to bewilder me. "Mind" may play an "of the gaps" game here but it's the only reasonable explanation that currently makes some semblance of sense to me.

So currently I agree with the religious on your last point. Machines or AI cannot be conscious. Maybe if they were infused with some sort of biological material though I would say maybe. If our subjective experience is a localization of a thing called "consciousness" and certain forms (humans, animals) localize it to produce a sense of self then why not.

jerednel · on Dec 5, 2022

Funnily enough I actually had a conversation with ChatGPT about this and we concluded that conscious decision making / free will is basically a higher order Markov process.