Total sum of all human knowledge ingested

Seems AI now has ingested all of internet and is turning to inventing its own data to train its models. Perhaps not the best idea to put it mildly. It might work if done very carefully under certain circumstances however ‘carefully’ is not necessarily the word associated with those who are developing the big AI models.

When it comes to species identification then I suspect there is a lot more that could be done with AI but again carefully planning what goes into the models to fill the gaps may be the next stage after just throwing everything in.

It seems that after the early big jumps in generative AI models the last year has shown a lot more issues i.e. only a slight improvement over earlier versions and reminded people to check what the models are producing. Was speaking to a poet who said that AI is not taking over his job because what the AI produces is obviously AI generated and bland with no human soul or meaning.

1 Like

Pandora’s Box:
one thing stayed trapped inside: Hope.
But is Hope the ultimate evil?

As some of the tech billionaires don’t seem to believe that there is such a thing as ‘truth’, presumably they don’t mind if AI resorts to free speech and ‘community correction’.

I saw that claim attributed to Elon Musk. I’m not convinced that he’s in a position to know.

For images, humanity is in a position to generate huge quantities of data (AI hasn’t had access to most of my 75,000 photographs of plants.)

Much (perhaps not all - I came up blank on a 19th century work recently) public domain material is available, but a substantial (less than used to be) amount of earlier still copyrighted material is not available online. I wonder how much paywalled content has been accessed by AI. Then there is all the stuff on the deep web (not the same as the dark web) and even airgapped systems.

However a different way forward is to retrain AI with better curated data. To consider Pl@ntNet as an example, I think that there original dataset, with a limited number of species, and a substantial number of training images for each species, was pretty well curated, but they’ve crowd-sourced extension to other species, and when I looked there was about a 10% misattribution rate in the images. Data curation is partly why AlphaFold and similar programs had good results, and why biopharma is having difficulty reproducing that success with other problems sets.

But he is in a position to ruin everything for the rest of us.
Be nice to him

Is that because you have not uploaded them on the internet or in the Cloud? Or because they are somehow protected from trawling?
Do we know if AI has had access to our Cloud storage?

If I retain copyright on an ispot post I have made, does that stop it being picked up for AI machine Learning?

And if I asked AI any of these questions, would it tell me the truth?

‘Answers on a postcard’, as we used to say.

Yes, because I haven’t uploaded the majority of them. A few thousand have gone up on iSpot, and a few hundred went up on Flickr for when I needed to refer to them on Usenet, from email, etc., and some more went up on one of my websites. Some uploaded images have only been linked to from email, so should have been invisible to search engines and AI.

Whether copyrighted material can be legally used for training AI is unresolved, but it certainly has been so used. (Generative AI can produce images in the style of particular artists, including living artists. While in principle this could be done by identifying AI generated images that by chance look like an artist’s style and using those for training, I’m fairly sure it was done using original images.) Neither the legal nor the moral status is clear to me.

I think that technology has overtaken intellectual property law, and that we need a societal debate on what are the limits of intellectual property in this context. (This may be a worse can of worms than the current state of academic publishing.)

Cloud storage is supposed to be private. If a cloud service provider has allowed to be used for training AI one should be able to get them on grounds other than copyright violations. If caught doing this they’d also lose a lot a business - companies wouldn’t appreciate their confidential internal documents being used. Ask the right question and perhaps commercially sensitive information may be leaked.

Interesting, do you have references on this?

In general with the UK government headlong rush into AI I suspect many of the issues with the technology are being swept under the carpet.
The experience of ispot with AI and personal experience in other contexts suggests that AI does make errors, potentially a lot of serious errors, but of different types of error to the errors that humans make. This may mean it appears better than it is or that a lot of error checking needs to be built and tested with real humans before systems go live. There is also the issue of making everything bland grey soup that humans refuse to engage with e.g. if every report that say the social worker writes looks almost identical because AI has written it then will the person actually making the decision miss the critical points because they are too bored. May sound a far fetched example but won’t know until the current trials have reported back which may be after the AI has been put in place everywhere.

Re the rush to a UK public rollout of AI, I am reminded of “ we must do something; this is something. Lets do it”.

Not a formal reference. Derek Lowe’s pharma blog has made references over the years to attempts to use AI in drug development.

I tried the middle link you mention and just got a ‘waiting for facebook’ message. Perhaps you need a login or similar.

It’s open access and at Science (AAAS); the only Facebook stuff on the page is share links.

Another Derek Lowe article on AI in biopharma - this time with a positive slant

Snake Antivenoms, Computed | Science | AAAS