Hacker Newsnew | past | comments | ask | show | jobs | submit | kasmura's commentslogin

This is THE book. It is suitable for beginners with basic math background equivalent to CS undergrad I’d say


I disagree. Ring attention and tree attention are so general that the core ideas are independent of details about modern GPUs. Maybe so for flash attention but not this. I also disagree because these algorithms are fundamentally about enabling long context by distributing across gpus and this would not be enabled by “moored law for gpu hardware”


Yes, it is just a way of computing the self-attention in a distributed way


That cannot be done when using OpenAI API calls as far as I know


Nobody in the original post or this entire discussion said anything about OpenAI until your comment.

I thought it was fairly obvious that we were talking about a local LLM agent... if DataHerald is a wrapper around only OpenAI, and no other options, then that seems unfortunate.


The agent is LLM agnostic and you can use it with OpenAI or self-hosted LLMs. For self hosted LLM we have benchmarked performance with Mixtral for tool selection and CodeLlama for code generation.


Integrated information theory which is mentioned in the article is abstract and supposedly applies to any type of physical system.


Is the following quote at odds with what you are saying about 50-way classification?

"On the other hand, the network is not merely classifying sentences, since performance is improved by augmenting the training set even with sentences not contained in the testing set (Fig. 3a,b). This result is critical: it implies that the network has learned to identify words, not just sentences, from ECoG data, and therefore that generalization to decoding of novel sentences is possible."


The difficulty of the problem is that of a 50-way classification. If the only goal was to minimize WER, a simple post-processing step choosing the nearest sentence in the training set could easily bring the WER down further. They've chosen to do it the way they did it presumably to show that it can be done that way, and I don't fault them for it.

They claim that word-by-word decoding implies that the network has learned to identify words. This may well be true, but it isn't possible to claim that from their result. For example, let's say you average all electrode samples over the relevant timespan, transform that representation with a FFW neural net, and feed that into the an RNN decoder. It would still predict word-by-word, on a representation that necessarily does not distinguish between words (because the time dimension has been averaged over). Such a model can still output words in the right order, just from the statistics of the training sentences being baked into the decoder RNN.


I am not much of an expert either, but as I understand it Zermelo-Frankel set theory is the most common foundation of mathematics and it is stronger than Peano arithmetic so I think Gödel's results hold in general.


Gödel's proof of inferential undecidability (incompleteness) does not work in strongly typed theories because his proposition I'mUnprovable does not exist.


Internet Access is a NEGATIVE right


I am excited about this. Competition in the market is good for innovation and the price.


Pedro Teixeira has made a screencast about Coffeescript in Node.js on his website notetuts.com: http://nodetuts.com/tutorials/16-introduction-to-coffeescrip...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: