• 0 Posts
  • 36 Comments
Joined 1 year ago
cake
Cake day: June 12th, 2023

help-circle

  • Yep my sentiment entirely.

    I had actually written a couple more paragraphs using weather models as an analogy akin to your quartz crystal example but deleted them to shorten my wall of text…

    We have built up models which can predict what might happen to particular weather patterns over the next few days to a fair degree of accuracy. However, to get a 100% conclusive model we’d have to have information about every molecule in the atmosphere, which is just not practical when we have a good enough models to have an idea what is going on.

    The same is true for any system of sufficient complexity.


  • This article, along with others covering the topic, seem to foster an air of mystery about machine learning which I find quite offputting.

    Known as generalization, this is one of the most fundamental ideas in machine learning—and its greatest puzzle. Models learn to do a task—spot faces, translate sentences, avoid pedestrians—by training with a specific set of examples. Yet they can generalize, learning to do that task with examples they have not seen before.

    Sounds a lot like Category Theory to me which is all about abstracting rules as far as possible to form associations between concepts. This would explain other phenomena discussed in the article.

    Like, why can they learn language? I think this is very mysterious.

    Potentially because language structures can be encoded as categories. Any possible concept including the whole of mathematics can be encoded as relationships between objects in Category Theory. For more info see this excellent video.

    He thinks there could be a hidden mathematical pattern in language that large language models somehow come to exploit: “Pure speculation but why not?”

    Sound familiar?

    models could seemingly fail to learn a task and then all of a sudden just get it, as if a lightbulb had switched on.

    Maybe there is a threshold probability of a positied association being correct and after enough iterations, the model flipped it to “true”.

    I’d prefer articles to discuss the underlying workings, even if speculative like the above, rather than perpetuating the “It’s magic, no one knows.” narrative. Too many people (especially here on Lemmy it has to be said) pick that up and run with it rather than thinking critically about the topic and formulating their own hypotheses.






  • You posted the article rather than the research paper and had every chance of altering the headline before you posted it but didn’t.

    You questioned why you were downvoted so I offered an explanation.

    Your attempts to form your own arguments often boil down to “no you”.

    So as I’ve said all along we just differ on our definitions of the term “understanding” and have devolved into a semantic exchange. You are now using a bee analogy but for a start that is a living thing not a mathematical model, another indication that you don’t understand nuance. Secondly, again, it’s about definitions. Bees don’t understand the number zero in the middle of the number line but I’d agree they understand the concept of nothing as in “There is no food.”

    As you can clearly see from the other comments, most people interpret the word “understanding” differently from yourself and AI proponents. So I infer you are either not a native English speaker or are trying very hard to shoehorn your oversimplified definition in to support your worldview. I’m not sure which but your reductionist way of arguing is ridiculous as others have pointed out and full of logical fallacies which you don’t seem to comprehend either.

    Regarding what you said about Pythag, I agree and would expect it to outperform statistical analysis. That is due to the fact that it has arrived at and encoded the theorem within its graphs but I and many others do not define this as knowledge or understanding because they have other connotations to the majority of humans. It wouldn’t for instance be able to tell you what a triangle is using that model alone.

    I spot another apeal to authority… “Hinton said so and so…” It matters not. If Hinton said the sky is green you’d believe it as you barely think for yourself when others you consider more knowledgeable have stated something which may or may not be true. Might explain why you have such an affinity for AI…



  • I question the value of this type of research altogether which is why I stopped following it as closely as yourself. I generally see them as an exercise in assigning labels to subsets of a complex system. However, I do see how the COT paper adds some value in designing more advanced LLMs.

    You keep quoting research ad-verbum as if it’s gospel so miss my point (and forms part of the apeal to authority I mentioned previously). It is entirely expected that neural networks would form connections outside of the training data (emergent capabilities). How else would they be of use? This article dresses up the research as some kind of groundbreaking discovery, which is what people take issue with.

    If this article was entitled “Researchers find patterns in neural networks that might help make more effective ones” no one would have a problem with it, but also it would not be newsworthy.

    I posit that Category Theory offers an explanation for these phenomena without having to delve into poorly defined terms like “understanding”, “skills”, “emergence” or Monty Python’s Dead Parrot. I do so with no hot research topics at all or papers to hide behind, just decades old mathematics. Do you have an opinion on that?



  • No I’m not.

    You’re nearly there… The word “understanding” is the core premise of what the article claims to have found. If not for that, then the “research” doesn’t really amount to much.

    As has been mentioned, this then becomes a semantic/philosophical debate about what “understanding” actually means and a short Wikipedia or dictionary definition does not capture that discussion.



  • There you go arguing in bad faith again by putting words in my mouth and reducing the nuance of what was said.

    You do know dissertations are articles and don’t constitute any form or rigorous proof in and of themselves? Seems like you have a very rudimentary understanding of English, which might explain why you keep struggling with semantics. If that is so, I apologise because definitions are difficult when it comes to language, let alone ESL.

    I didn’t dispute that NNs can arrive at a theorem. I debate whether they truly understand the theorem they have encoded in their graphs as you claim.

    This is a philosophical/semantical debate as to what “understanding” actually is because it’s not really any evidence that they are any more than clever pattern recognition algorithms driven by mathematics.


  • You’re being downvoted because you provide no tangible evidence for your opinion that human consciousness can be reduced to a graph that can be modelled by a neural network.

    Addidtionally, you don’t seem to respond to any of the replies you receive in good faith and reach for anecdotal evidence wherever possible.

    I also personally don’t like the appeal to authority permeating your posts. Just because someone who wants to secure more funding for their research has put out a blog post, it doesn’t make it true in any scientific sense.


  • Seems to me you are attempting to understand machine learning mathematics through articles.

    That quote is not a retort to anything I said.

    Look up Category Theory. It demonstrates how the laws of mathematics can be derived by forming logical categories. From that you should be able to imagine how a neural network could perform a similar task within its structure.

    It is not understanding, just encoding to arrive at correct results.



  • So somewhere in there I’d expect nodes connected to represent the Othello grid. They wouldn’t necessarily be in a grid, just topologically the same graph.

    Then I’d expect millions of other weighted connections to represent the moves within the grid including some weightings to prevent illegal moves. All based on mathematics and clever statistical analysis of the training data. If you want to refer to things as tokens then be my guest but it’s all graphs.

    If you think I’m getting closer to your point can you just explain it properly? I don’t understand what you think a neural network model is or what you are trying to teach me with Pythag.


  • They operate by weighting connections between patterns they identify in their training data. They then use statistics to predict outcomes.

    I am not particularly surprised that the Othello models built up an internal model of the game as their training data were grid moves. Without loooking into it I’d assume the most efficient way of storing that information was in a grid format with specific nodes weighted to the successful moves. To me that’s less impressive than the LLMs.