SharePoint search filter changes behavior when Thai or Chinese characters are on page #10098
Replies: 2 comments 1 reply
-
|
If it's the query against the service which fail against You should verify the value in the SPTranslationLanguage fields for chinese pages as well to ensure it has the value you expect. If the value is |
Beta Was this translation helpful? Give feedback.
-
|
I've done a bit of research on this. Here is a bit more info. I searched for "zh-cn" and "zh-tw" and found similar issues. Yes, the value of _SPTranslationLanguage is correct in the library. The pattern seems to be that "zh-cn" retrieves only items where the Managed Property DetectedLanguage is not "zh-cn", that is to say ones that are not really in Chinese. I see that the crawled property OWS__SPTranslationLanguage is mapped to two different managed properties, SPTranslationLanguage and SPTranslationLanguageWBOff. Querying for SPTranslationLanguageWBOff:"zh-cn" returns a lot more results. The two managed properties are configured differently, I'm not quite sure why. The WBOff version is not safe for anonymous, and it has complete matching. I was expecting to see a difference in token normalization or language-neutral tokenization, but no so such luck. The indexer and the search engine do (or FAST did) some language-specific token normalization and stemming based on language. For example it knows that in English an "s" at the end usually means a plural, but this is not the case for German. When you use scripts that normally don't have latin characters but that tolerate having a mix of Unicode blocks, some unexpected things sometimes happens. Chinese does have various dashes but no hyphen. Hyphens in foreign names are often removed. Does knowing that a document is in Chinese make SharePoint search choke on the hyphen? Maybe. I have tries mapping the crawled property to various RefinableString managed properties with different configurations, but no success so far. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Target SharePoint environment
SharePoint Online
What SharePoint development model, framework, SDK or API is this about?
SharePoint REST API
Developer environment
None
What browser(s) / client(s) have you tested
Additional environment details
SPO, I used Chrome but since the bug is situated in the SPO search API back-end, developer env does not really matter.
Describe the bug / error
Hi
@wobba This weird SP search issue might interest you..
I'm using some advanced SharePoint search filtering, which seems to no longer work once Thai or Chinese characters are added to a text web part on a modern SPO page (= trigger of the issue) that has been created as a translated page (out of the box feature of SPO). Easy enough to reproduce on any tenant it seems.
There are two types of filtering that no longer work:
I filter on the language of a translated page using SPTranslationLanguage:{Page._SPTranslationLanguage}. This value contains e.g. fr-fr, pt-br, th-th (Thai), zh-cn (Chinese), ...
When triggering the issue, SPTranslationLanguage:"zh-cn" no longer returns Chinese pages with Chinese characters in the page body. I've found that just using "zh-" or "th-" does still work. I can use this in certain cases as a workaround.
I filter on a taxonomy field. The managed property of a taxonomy column Region might be called TaxIdRegion, which indexes the path of term ids. The SP user profile is enriched with a single value taxonomy tied attribute, called MyRegion The following filter works fine unless we trigger the issue by adding Thai or Chinese characters to the page:
TaxIdRegion:{User.MyRegion} gets modified into its actual value, e.g. TaxIdRegion:"#0cc6242eb-bdfa-4c80-a979-19f0eab6318b".
I don't have a workaround for the taxonomy filter unfortunately. We use this to show personalized content on an intranet, and this means Chinese & Thai users do not get personalized content due to the unexpected behavior, which is working fine for other languages that do not contain Thai nor Chinese characters. I did not check, but it might very well be the case for any set of non-latin characters.
Steps to reproduce
The taxonomy filter to be used is a bit more involved to set up.
Expected behavior
The expected behavior is that the 2 aforementioned search query filters behave the same way as they do for other languages, such as French, Portuguese, Spanish. The behavior should not be modified by adding Thai or Chinese characters in a text web part on a modern SPO page.
Beta Was this translation helpful? Give feedback.
All reactions