• 1 Post
  • 609 Comments
Joined 1 year ago
cake
Cake day: January 24th, 2024

help-circle

  • But what is meant by “integrity of the model, inputs and outputs”?

    I guess I don’t understand the attack vector, what’s the threat here? Someone messes with the model file or refines a model towards a specific malicious bias like inserting scam links where legit links would go and passes it off as the real deal?

    I’m more general cybersec than crypto so idk but isn’t that what hash sums are for?

    Surely if someone messed with my .ckpt or .safetensors it won’t be the same file anymore?

    And what does that have to do with validity of the inputs?





  • Not dev but I’m in IT/Cybersec mostly as it’s much easier to find jobs there and I use vim just about everywhere, usually with tmux and i3 with custom vim-like keybinds (super+j move focus right etc), I use vim even on my phone in termux, with gboard.

    I don’t use LSPs cause CBA but I only use Python and C and maybe occasionally bash for homelab stuff and I don’t have large projects (😭).

    If I’m doing any ML stuff from scratch (not refining or writing API for local llm model or integrating it with another API but just building classifiers etc) for fun I use Jupyter. Such a wildly different way of coding honestly ngl it’s wild, but great when you need to document what you’re doing.

    At uni I used to use fucking Visual Studio with C# and Netbeans with Java, but I learned it pretty well. I don’t think they ever even taught us how to run code outside those 😂

    At work I use gedit and gnome terminal for navigation cuz it’s company time unless I’m personally interested in what I’m doing.


  • It’s complicated.

    I know Stable Diffusion best so I’ll speak to that, they used to the LAION-5B dataset, which is, in practice freely available to download and use:

    https://www.kaggle.com/code/vitaliykinakh/guie-laion-5b-collect-and-download

    https://github.com/opendatalab/laion5b-downloader

    It’s also on HuggingFace but it’s unavailable.

    https://huggingface.co/datasets/danielz01/laion-5b

    But you can use this smaller newer version:

    https://huggingface.co/datasets/laion/relaion2B-en-research

    Whether it’s appropriately licensed is an unsolved question though.

    The dataset itself and the text portion of the text-imags pairs needed for training is CC-BY-SA, the newer versions linked above are CC-BY-4.0. https://creativecommons.org/licenses/by/4.0/deed.en

    The images however are technically under their own copyright, which in practice means each of the billions of images could or could not have a licence that implicitly or explicitly forbids AI training use or forbids it only for commercial use.

    Whether such a license is legally binding is at present unknown though, since licenses primarily deal with reproductions, which the pro-AI folks argue isn’t the case, and that training of NNs is more akin to viewing an image and memorising the patterns and relationships within, like a person viewing it.

    That would make it non-infringing and therefore the model itself libre. In that case Mistral and LLaMa are also libre as long as the model itself is open source, which in this case really means “open weights”, so not like GPT and anything by “”“OpenAI”“”.

    Weights are the result of a model being trained essentially. They’re they key bit that makes it or breaks it and how it works. Given that and knowing the structure of the model and framework used you can refine, modify and distribute it.

    Those against AI will say that it’s more akin to file compression and that in one form or another it’s misuse. That would make the model an infringing derivative work and as such nor libre even if the model weights are open source.

    In a way though you could argue that me vaguely memorising the imagery of a dude dressed in white holding a laser sword is just a lossy compressed copy of the copyrighted work of Star wars, and it’d be absurd to think that’s a violation and that infringement only occurs if I reproduce a work of substantial similarity commercially from that memory.

    If I use Krita and draw a beautiful landscape which has been informed and inspired by at least in part by a movie I saw, is that copyright infringement or not? What if I use AI?

    Well, current laws don’t say. We measure infringement in substantial similarity, provenance of information only comes in later (e.g. to prove against accidental similarity).

    That’s also my own personal stance on the legal side of things, so up to you how you see it.




  • As a socialist I believe intellectual property is a falsehood and technological advancement should be for the public good. Open source LLMs are for the public good.

    Given the options between having open source LLMs and the US Govt banning non-corpo non-proprietary LLMs and giving a free pass to people like Musk and Altman and Zucc to monopolize, I happily pick the former.

    You’re delusional if you think they will pay anyone, the only way zucc will pay is with a guillotine.

    Corpos will make inter-platform deals that’ll simply make all online data licensable for the right price and enrich each other so you can’t avoid it while still actually being a career creative, but price out academic researchers and the public sector so that all fruits of it stay behind closed R&D doors and be free of ethics etc.

    Continuing in your role as a useful idiot, you’ll also most likely also foot the bill for it via subsidies from your taxes to “develop the AI sector” in some anti-China dick measuring contest by the US.

    You will then be sold this data back via proprietary chat bots via a monthly subscription and you better pay up because once it gets really good, it will become mandatory to use for just about any job, leaving you with no choice.

    Or you can support FOSS LLMs.





  • Huh? Did you respond to the wrong user?I’m not OP, I don’t go out talking at people at work.

    All of my friends are already pretty much on the same page more or less, it’d be hard to be friends with someone who is against human rights or doesn’t care about such things as I’m a minority.

    The question i posed in my comment was about a societal scale: what do you do to reach a disengaged electorate or an electorate that has no desire to know the truth and is not actively seeking it out whatsoever, instead believing things that re completely transparently false.

    Because as it stands, the current strategy of content online or in traditional media simply ends up preaching to the choir, the lectures containing truth end up reaching only those who seek them out and as such already have an allegiance to the truth and likely at least to some extent agree with them, or see them as epistemologically well justified beliefs imperically and/or logically.

    I personally rather obviously can’t make friends of like 50% of the population of a country for instance, so it’s not really a workable solution lol and I don’t think that’s what you meant.

    So how do you show those people who believe transparently false things because it suits them the truth and teach them to want to seek out truth and want to believe the truth and to spot falsehoods and not be swayed by rhem, when those people have absolutely no interest in such things?

    And if you can’t, what do you do then? Because these people will literally destroy a democratic society if given the chance.





  • All anyone wants to do is lecture me about how they are right, and I am wrong if I think different than them

    The only relevant question is - are you wrong?

    Is your take actually valid? Based on sound imperical data? Is not fallacious? Does your reasoning stand up to scrutiny? Is it fact, or a belief? Is it a justified belief?

    Ultimately you shouldn’t need to be coddled if you have any allegiance to the truth.

    It’s one thing if a 3-year old gets 2+2 wrong. It’s another when it’s a 33 year old. Would you waste energy on that, or would you assume that the 33-year old doesn’t care enough to bother no matter what approach is used?

    The unfortunate reality is that democracy as a vehicle for progress is a failure because not enough people have an allegiance to the truth, nor have the basic epistemological tools for determining what’s knowledge, what’s belief, what’s a hypothesis, what’s theory or what’s valid evidence or any idea of what the scientific method even is, or what an axiom is etc.

    They favour their delusions (I don’t mean religion specifically) over truth.


  • Idk I listen to politics lectures all the time, most of which I don’t fully agree with, many I disagree with outright, listening to other takes, especially opposing ones helps me scrutinize my own reasoning and critically analyze what’s what.

    It’s not really the lecturer’s fault he was lecturing, if he was right and so he should be lecturing others on truth. Much like any subject really.

    This idea that all opinions are equal are how we ended up in a post-truth world.

    Thought-terminating clichés of “everyone likes different things” or “people believe different things” are not just signs of a lazy intellect, they are the harbingers of our doom.

    You can have beliefs that aren’t facts, in fact - you have to, but you can’t just believe whatever, you need to be able to justify it, and to do that you need to understand logic, you need to understand evidence, you need to understand the scientific method and how to reason.


  • We are all but dust, my friend.

    That’s very fatalistic. In the end, unfortunately, for now, maybe, but that doesn’t mean the journey doesn’t have memories worth making and things worth keeping, especially when it comes to our bonds with others, and especially when it’s just undeniably useful, or we’d never have invented writing.

    And I guess you’ve got nothing to offer there

    I mean I gave the reasons many times over? From personal to purely practical. If they don’t seem to have value to you, that’s on you. I don’t know what else you want?

    Implying that the only value I recognize is monetary? Don’t be a dick.

    You’re the one who said you’d keep them for a podcast? I’m sorry, I don’t mean to be a dick, i was just going off what you said.