Skip to content

Conversation

@kapitoshka438
Copy link
Contributor

@kapitoshka438 kapitoshka438 commented Jan 8, 2026

#6045

Summary

  • Adds new invisible_characters lint rule that detects invisible characters in string literals:
    • zero-width space U+200B
    • zero-width non-joiner U+200C
    • FEFF formatting character U+FEFF
  • Reports all occurrences with precise violation positions pointing to each invisible character
  • Includes early exit optimization to avoid unnecessary iteration when no invisible characters are present

@SwiftLintBot
Copy link

SwiftLintBot commented Jan 8, 2026

19 Messages
📖 Building this branch resulted in a binary size of 27304.87 KiB vs 27253.15 KiB when built on main (0% larger).
📖 Linting Aerial with this PR took 0.81 s vs 0.76 s on main (6% slower).
📖 Linting Alamofire with this PR took 1.04 s vs 1.04 s on main (0% slower).
📖 Linting Brave with this PR took 7.23 s vs 7.14 s on main (1% slower).
📖 Linting DuckDuckGo with this PR took 24.56 s vs 24.57 s on main (0% faster).
📖 Linting Firefox with this PR took 11.68 s vs 11.66 s on main (0% slower).
📖 Linting Kickstarter with this PR took 8.03 s vs 8.0 s on main (0% slower).
📖 Linting Moya with this PR took 0.44 s vs 0.45 s on main (2% faster).
📖 Linting NetNewsWire with this PR took 2.43 s vs 2.45 s on main (0% faster).
📖 Linting Nimble with this PR took 0.71 s vs 0.66 s on main (7% slower).
📖 Linting PocketCasts with this PR took 7.32 s vs 7.29 s on main (0% slower).
📖 Linting Quick with this PR took 0.43 s vs 0.44 s on main (2% faster).
📖 Linting Realm with this PR took 3.32 s vs 3.29 s on main (0% slower).
📖 Linting Sourcery with this PR took 1.89 s vs 1.87 s on main (1% slower).
📖 Linting Swift with this PR took 4.4 s vs 4.43 s on main (0% faster).
📖 Linting SwiftLintPerformanceTests with this PR took 0.33 s vs 0.33 s on main (0% slower).
📖 Linting VLC with this PR took 1.14 s vs 1.15 s on main (0% faster).
📖 Linting Wire with this PR took 18.49 s vs 18.52 s on main (0% faster).
📖 Linting WordPress with this PR took 12.27 s vs 12.2 s on main (0% slower).

Generated by 🚫 Danger

@kapitoshka438 kapitoshka438 force-pushed the invisible_characters branch 2 times, most recently from 3561c44 to a89ae33 Compare January 8, 2026 21:52
@kapitoshka438
Copy link
Contributor Author

@SimplyDanny please review. I don't check NULL control character: \u0000 because it gets lost when copypasting so it's almost impossible to have it in string literals by accident.

@kapitoshka438 kapitoshka438 force-pushed the invisible_characters branch 2 times, most recently from 9382226 to ceccd1d Compare January 19, 2026 12:47
Copy link
Collaborator

@SimplyDanny SimplyDanny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

I wonder if this rule should catch even more characters of this kind, some maybe optionally. We could check what other linters do.

@kapitoshka438 kapitoshka438 force-pushed the invisible_characters branch 3 times, most recently from 3a37574 to 708d32e Compare January 23, 2026 21:53
@kapitoshka438
Copy link
Contributor Author

kapitoshka438 commented Jan 23, 2026

Thanks for the contribution!

I wonder if this rule should catch even more characters of this kind, some maybe optionally. We could check what other linters do.

  • This tool led me to another character (zero-width non-joiner U+200C). I've added processing it to this PR.
  • ESLint provides a rule that prevents using some control characters in regular expressions, but I think this is not our case because those characters are visible.
  • There is also zero-width joiner character U+200D, but it is used to construct complex symbols, such as some emoji example

Copy link
Collaborator

@SimplyDanny SimplyDanny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be wrong to have the rule auto-correct the findings by just removing the violating characters?

We could also think about making the rule configurable, so that people can add their own characters with descriptions. That's what the "tool" you cited allows as well.

@SimplyDanny SimplyDanny linked an issue Jan 25, 2026 that may be closed by this pull request
2 tasks
@kapitoshka438
Copy link
Contributor Author

Would it be wrong to have the rule auto-correct the findings by just removing the violating characters?

We could also think about making the rule configurable, so that people can add their own characters with descriptions. That's what the "tool" you cited allows as well.

The second commit makes the rule correctable. I've also made some performance optimizations.
As for custom configurations, let me think about this. I'm sure there may be more symbols we haven't accounted for, but I'm not sure users should be forced to set descriptions for them. Each symbol has a name, and the purpose of this rule is to control the presence of specific special symbols, not just any symbols.

@SimplyDanny
Copy link
Collaborator

I'm sure there may be more symbols we haven't accounted for, but I'm not sure users should be forced to set descriptions for them.

Fine for me to skip the description or have it optional.

@kapitoshka438
Copy link
Contributor Author

I'm sure there may be more symbols we haven't accounted for, but I'm not sure users should be forced to set descriptions for them.

Fine for me to skip the description or have it optional.

I've added the ability to configure additional characters. I thought about it for a long time and initially decided not to allow disabling the validation of default characters. The hardest part was coming up with names for the parameter and variables 😅. But I'm still having doubts. Please review )

Copy link
Collaborator

@SimplyDanny SimplyDanny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very good now! Thank you for all the updates.

I posted a suggestion for the option's name. Let me know what you think about it.

key: "include_hex_codes",
postprocessor: { $0.formUnion(defaultCharacters) }
)
private(set) var violatingCharacters = Set<String>()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you make UnicodeScalar conform to AcceptableByConfigurationElement, it should end up in a Set<UnicodeScalar> right away. This avoids the conversion in violatingScalars() and the definition of defaultCharacters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, thanks. I've made changes.

@ConfigurationElement(key: "severity")
private(set) var severityConfiguration = SeverityConfiguration<Parent>.error
@ConfigurationElement(
key: "include_hex_codes",
Copy link
Collaborator

@SimplyDanny SimplyDanny Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We often use additional_ if the values extend a set of defaults. I'm not sure about "hex codes". Would "code points" be clearer? Then we would end up with additional_code_points. Not sure either ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable for me. Done in the last commit.

description: """
Disallows invisible characters like zero-width space (U+200B), \
zero-width non-joiner (U+200C), and FEFF formatting character (U+FEFF) \
in string literals as they can cause hard-to-debug issues
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
in string literals as they can cause hard-to-debug issues
in string literals as they can cause hard-to-debug issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rule Request: invisible_characters

4 participants