Dictionary & Browser Issues

dejayM · 11 June 2023 10:30

Chris_the_coder will certain pick this up soon

dejayM · 11 June 2023 10:54

ragworms

JoC · 11 June 2023 17:46

[quote=“lavateraguy, post:120, topic:1773”]
It probably indicates an ISO-8859/UTF-8 issue.
[/quote].

I asked ChatGPT. Umlaut’s are represented as two characters in utf8. The problem comes from programs that expect 1 byte per character.

Ken_Noble · 12 June 2023 05:56

Not sure what umlauts have to do with this moth name!

Chris_Valentine · 12 June 2023 07:33

Fixed. There were 208 identifications using that name. And I’ve changed the species dictionary entry too. It has already been fixed in the latest release of the UK dictionary, which is still only on the acceptance/test server.

dejayM · 12 June 2023 07:45

Thanks Chris
Shows that we should be reporting all DICTIONARY and BROWSER issues here

lavateraguy · 12 June 2023 08:23

Accented characters in general (and quite a bit of other stuff) are represented as two bytes n utf-8. Problems go both ways. On the one hand programs that expect ISO 8859 treat two byte utf-8 characters as a pair of characters the first of which commonly turns out to be an accented A. On the other hand if a program expecting utf-8 is fed ISO 8859 and encounters an accented character (etc) it treats it as the first byte of a two byte character.

Back in the day the equivalent problems related to ASCII (7 bit) versus ISO 8859 (8 bit). Not all of the internet was 8 bit clean, and ISO 8859 text didn’t always get to its destination unscathed.

Chris_Valentine · 12 June 2023 08:35

We’re going to have even more fun when we update the Global dictionary. Based on the one data file I’ve looked at so far, it includes Japanese characters.

FWIW the current iSpot database uses:
character set: utf8mb3
collation: utf8mb3_general_ci

dejayM · 12 June 2023 08:48

There seems to be no such difficulties in this site

lavateraguy · 12 June 2023 12:15

The web tells me that utf-8 as used in HTML is utf8mb4 (uses 1 to 4 bytes per code point, and supports all UNICODE code points); utf8mb3 uses 1 to 3 bytes per code point, and only supports the basic multilingual plane (65536 code points, of which ~55,000 are assigned).

utf8mb3 doesn’t support emojis, which are on the supplementary multilingual plane. utf8mb4 does, and they work in HTML. Other bits of Unicode may not work because of lack of font support - the default font on MS Edge is Times New Roman, which supports ~3,500 characters.

Supported unicode character report for Times New Roman (fileformat.info)

The web tells me that MySQL is supposed to be faster with utf8mb4.

Chris_Valentine · 12 June 2023 13:19

In which case a change to utf8mb4 would make a whole shed load of sense!

lavateraguy · 12 June 2023 14:09

The specific quote was that collation was faster.

MySQL :: MySQL 8.0: When to use utf8mb3 over utf8mb4?

One would have to look further into the implications elsewhere, but I’d guess that additional storage demands are negligible when using alphabetic scripts, especially Latin.

Luisa · 12 June 2023 14:19

Not really a dictionary or browser issue, but not sure when the “created by” notification started appearing in the changes tracker after you interact with a post. Is that an intentional thing?

dejayM · 12 June 2023 19:35

Whilst I cannot find ‘Created by’, Created has been a feature of the Tracker almost forever
It is a very useful filter element

Luisa · 12 June 2023 20:37

Hi Dejay, what I was referring to was that after agreeing to some posts I am getting notifications in my changes tracker to say these posts had been created. See below - these were in yellow as with other changes after I had agreed to elouisedowney’s posts.

dejayM · 12 June 2023 21:09

Yes - it is your CHANGES tracker - your print is incomplete (but normal)
Mine is here
Shows WHAT I did. Mike’s Cockle - I Edited it (added a tag) IDd it and left a comment
It was created by miked and has 4 comments
NO-ONE sees your CHANGES Tracker, anyone can see your ACTIVITY Tracker

Luisa · 12 June 2023 21:26

Thanks Dejay - I think I was just getting a bit confused - I’m going to blame the heat!

lavateraguy · 13 June 2023 10:48

The Changes Tracker highlights changes (within your filtering criteria) that have changed since you last looked at your Changes Tracker. That may including events (including creation) that occurred between when you last Changes Tracker and when you interacted with an observation.

Ken_Noble · 30 June 2023 16:05

Sorry but EristalisÂ abusiva still carries the Â symbol erroneously.

dejayM · 1 July 2023 20:11

thanks
Eristalis | Species Dictionary | UK and Ireland | iSpot Nature @Chris_Valentine

share nature

Dictionary & Browser Issues