Dictionary & Browser Issues

Chris_the_coder will certain pick this up soon

ragworms

[quote=“lavateraguy, post:120, topic:1773”]
It probably indicates an ISO-8859/UTF-8 issue.
[/quote].

I asked ChatGPT. Umlaut’s are represented as two characters in utf8. The problem comes from programs that expect 1 byte per character.

Not sure what umlauts have to do with this moth name!

Fixed. There were 208 identifications using that name. And I’ve changed the species dictionary entry too. It has already been fixed in the latest release of the UK dictionary, which is still only on the acceptance/test server.

Thanks Chris
Shows that we should be reporting all DICTIONARY and BROWSER issues here

Accented characters in general (and quite a bit of other stuff) are represented as two bytes n utf-8. Problems go both ways. On the one hand programs that expect ISO 8859 treat two byte utf-8 characters as a pair of characters the first of which commonly turns out to be an accented A. On the other hand if a program expecting utf-8 is fed ISO 8859 and encounters an accented character (etc) it treats it as the first byte of a two byte character.

Back in the day the equivalent problems related to ASCII (7 bit) versus ISO 8859 (8 bit). Not all of the internet was 8 bit clean, and ISO 8859 text didn’t always get to its destination unscathed.

We’re going to have even more fun when we update the Global dictionary. Based on the one data file I’ve looked at so far, it includes Japanese characters.

FWIW the current iSpot database uses:
character set: utf8mb3
collation: utf8mb3_general_ci

There seems to be no such difficulties in this site

The web tells me that utf-8 as used in HTML is utf8mb4 (uses 1 to 4 bytes per code point, and supports all UNICODE code points); utf8mb3 uses 1 to 3 bytes per code point, and only supports the basic multilingual plane (65536 code points, of which ~55,000 are assigned).

utf8mb3 doesn’t support emojis, which are on the supplementary multilingual plane. utf8mb4 does, and they work in HTML. Other bits of Unicode may not work because of lack of font support - the default font on MS Edge is Times New Roman, which supports ~3,500 characters.

Supported unicode character report for Times New Roman (fileformat.info)

The web tells me that MySQL is supposed to be faster with utf8mb4.

In which case a change to utf8mb4 would make a whole shed load of sense!

The specific quote was that collation was faster.

MySQL :: MySQL 8.0: When to use utf8mb3 over utf8mb4?

One would have to look further into the implications elsewhere, but I’d guess that additional storage demands are negligible when using alphabetic scripts, especially Latin.

Not really a dictionary or browser issue, but not sure when the “created by” notification started appearing in the changes tracker after you interact with a post. Is that an intentional thing?

Whilst I cannot find ‘Created by’, Created has been a feature of the Tracker almost forever
It is a very useful filter element

Hi Dejay, what I was referring to was that after agreeing to some posts I am getting notifications in my changes tracker to say these posts had been created. See below - these were in yellow as with other changes after I had agreed to elouisedowney’s posts.

Yes - it is your CHANGES tracker - your print is incomplete (but normal)
Mine is here
Shows WHAT I did. Mike’s Cockle - I Edited it (added a tag) IDd it and left a comment
It was created by miked and has 4 comments
NO-ONE sees your CHANGES Tracker, anyone can see your ACTIVITY Tracker

Thanks Dejay - I think I was just getting a bit confused - I’m going to blame the heat!

The Changes Tracker highlights changes (within your filtering criteria) that have changed since you last looked at your Changes Tracker. That may including events (including creation) that occurred between when you last Changes Tracker and when you interacted with an observation.

1 Like

Sorry but Eristalis abusiva still carries the  symbol erroneously.

thanks
Eristalis | Species Dictionary | UK and Ireland | iSpot Nature @Chris_Valentine