Discourse should ignore if a character is accented when doing a search
Thanks, will have a look tomorrow. Agree it is not simple, but to be on the safe side, we can leave the default off for most langs and keep the decision on admins. I think German is another one, do we...
View ArticleDiscourse should ignore if a character is accented when doing a search
I wonder, is there a way to find out what Google does for each language? Seems like a good benchmark. Read full topic
View ArticleDiscourse should ignore if a character is accented when doing a search
I suspect that Google has an automatic per word heuristic that is language neutral. I doubt we have any chance in matching what they do. @Osama / @Pad_Pors what is the correct thing to do in Arabic, I...
View ArticleDiscourse should ignore if a character is accented when doing a search
In Arabic we almost never type diacritics in day-to-day communications, because an Arabic diacritic is separate character that you need to type in addition to the base character you want to add the...
View ArticleDiscourse should ignore if a character is accented when doing a search
the same as @Osama has described goes for Persian locale (fa_IR), people rarely type diacritics; neither in search terms nor in their daily dialogues. so I’d say one can forget about them at least for...
View ArticleDiscourse should ignore if a character is accented when doing a search
Okay, from discussion above it seems stripping diacritics for search should be enabled by default at least for: French, Portugal (and by extension Spanish as well I guess), Arabic, Farsi, Czech. I...
View ArticleDiscourse should ignore if a character is accented when doing a search
@danekhollas Also, the Greek language needs to be included in that list. Read full topic
View ArticleDiscourse should ignore if a character is accented when doing a search
In Romanian you can strip all diacritics: we can write and read with the english letters w/o any problems. Read full topic
View ArticleDiscourse should ignore if a character is accented when doing a search
danekhollas: (and by extension Spanish as well I guess) Same with Catalan (20 chars) Read full topic
View ArticleDiscourse should ignore if a character is accented when doing a search
The PR is up https://github.com/discourse/discourse/pull/6397/files Read full topic
View ArticleDiscourse should ignore if a character is accented when doing a search
While I’m not involved in this i18n, it has been a most interesting discussion to learn about how other languages are actually written. Thanks to everyone who shared their written language...
View ArticleDiscourse should ignore if a character is accented when doing a search
This is now in with a new name https://github.com/discourse/discourse/commit/4481836de2feb4813b6282a6ec4ae4fdde509627 Read full topic
View ArticleDiscourse should ignore if a character is accented when doing a search
Hmm, unfortunately it seems we’re not quite there yet. I see two big issues after bit of testing (upgraded today to the latest Discourse version): diacritics is not elided from the query string...
View ArticleDiscourse should ignore if a character is accented when doing a search
danekhollas: diacritics is not elided from the query string itself, i.e. if I search with a word including diacritics, I will not find anything @zogstrip this feels like something we got to get...
View ArticleDiscourse should ignore if a character is accented when doing a search
Thank you @sam for asking, I am humbled. I can do some explorations during weekend, but I want to be honest with you that in any case this would most likely require a lot of hand holding and I do not...
View ArticleDiscourse should ignore if a character is accented when doing a search
danekhollas: If you’re okay with that, some initial pointers where to look would be appreciated. You’ll probably need to extract the strip_diacritics method so it can also be used in the...
View ArticleDiscourse should ignore if a character is accented when doing a search
There might be problems, some examples: “álom” means “dream”. “alom” means “litter”. “rag” means “suffix” / “inflection”. “rág” means “chew”. “kar” means “arm”. “kár” means “damage”. But I think end...
View ArticleDiscourse should ignore if a character is accented when doing a search
Thanks for chiming in! I guess it depends on how many of these examples are there. If there are not that many, then I’d say it is always better to get more search results, albeit sometimes irrelevant,...
View ArticleDiscourse should ignore if a character is accented when doing a search
This can be better handled at the database level with an appropriate collation, instead of the blunt approach of stripping accents. Many databases offer accent insensitive collations for different...
View ArticleDiscourse should ignore if a character is accented when doing a search
@zogstrip Thank you for pointers, they seem to do the trick! I’ve made a PR https://github.com/discourse/discourse/pull/6518 I’ve tried adding some tests, but was generally super confused about how...
View ArticleDiscourse should ignore if a character is accented when doing a search
Thanks Sam for merging! Just upgraded our forum and it works well. danekhollas: diacritics is not elided from the query string itself, i.e. if I search with a word including diacritics, I will not...
View ArticleDiscourse should ignore if a character is accented when doing a search
danekhollas: Perhaps now is a good time after my fix was merged. I want to wait a tiny bit more, rebuild of index is quite expensive so I want to make sure I don’t dish out the cost to early. Read...
View ArticleDiscourse should ignore if a character is accented when doing a search
Okay, if it is tiny. for others: if you update to tests passed right now and have a locale for which this site setting is default on, your search will be badly broken, so you need to either turn of...
View ArticleDiscourse should ignore if a character is accented when doing a search
Technically yes, we could do something like that by running a query to reset the version of the index, the trouble here is that re-indexes really are pretty expensive. Keeping this in mind though....
View Article
More Pages to Explore .....