Regex support for missing predefined character classes and negated character classes#1474
Regex support for missing predefined character classes and negated character classes#1474
Conversation
…omplement for predefined character classes
…rrently fails otherwise
…as CI currently fails otherwise" This reverts commit 56a7462.
core/src/main/kotlin/org/evomaster/core/search/gene/regex/CharacterRangeRxGene.kt
Outdated
Show resolved
Hide resolved
core/src/main/kotlin/org/evomaster/core/search/gene/regex/CharacterRangeRxGene.kt
Outdated
Show resolved
Hide resolved
core/src/main/kotlin/org/evomaster/core/search/gene/regex/CharacterRangeRxGene.kt
Outdated
Show resolved
Hide resolved
core/src/main/kotlin/org/evomaster/core/search/service/Randomness.kt
Outdated
Show resolved
Hide resolved
core/src/main/kotlin/org/evomaster/core/search/service/Randomness.kt
Outdated
Show resolved
Hide resolved
core/src/main/kotlin/org/evomaster/core/search/service/Randomness.kt
Outdated
Show resolved
Hide resolved
core/src/main/kotlin/org/evomaster/core/search/service/Randomness.kt
Outdated
Show resolved
Hide resolved
core/src/test/kotlin/org/evomaster/core/search/gene/GeneSamplerForTests.kt
Outdated
Show resolved
Hide resolved
|
Looking into this failing check, the test generates a SqlCompositeGene which contains a ChoiceGene which contains an ArrayGene which starts empty but its template contains SqlForeignKeyGene. This generates no problems as long as the array is empty, but as soon as the array gets a random element everything stops being globally valid (in this case the new element, SqlForeignKeyGene is not globally valid as none of its parents in the chain are an SqlAction). The test filters globally invalid genes before randomizing but not after. This happens on GeneRandomizedTest during verifyCopyValueFrom. We could add a globally valid check after randomization the same way it is done with printable checks, this makes the test pass. However, I am not sure if there should be checks in place to make this chain invalid even before randomizing instead. private fun verifyCopyValueFrom(
mutable: List<Gene>,
rand: Randomness
) {
val printable = mutable.filter { it.isGloballyValid() && it.isPrintable() }
printable.forEach { root ->
val x = root.copy()
val sx = x.getValueAsRawString()
val y = x.copy().apply { randomize(rand, true) }
if(!y.isPrintable()){ // we could add a globally valid check here the same way we filter before
return@forEach
}
val sy = y.getValueAsRawString() |
|
CI is passing. I think this is ready for Andrea's review. |
| - aDouble | ||
| - aFloat | ||
| - aBoolean | ||
| - aNullableString |
There was a problem hiding this comment.
why this change?
@Pgarrett you wrote this file, isn't it? what is your opinion on this change?
There was a problem hiding this comment.
Yes I did, it seemed one of the fields was not being generated in the test and thus causing the test to fail in the reflective assertions
| | HexEscapeSequence | ||
| | UnicodeEscapeSequence | ||
| | OctalEscapeSequence | ||
| | 'p' BRACE_open PosixCharacterClassLabel BRACE_close |
There was a problem hiding this comment.
add comment that this is specific to Java... possible also add link where \p is explained
There was a problem hiding this comment.
\p is allowed in javascript also but only when certain flags are enabled. There is no flag support for Java/JS yet.
| //| IdentityEscape | ||
| ; | ||
|
|
||
| fragment PosixCharacterClassLabel |
There was a problem hiding this comment.
provide reference link from where those values were taken from. recall the comments at the top of this file
| fragment CharacterClassEscape | ||
| //one of d D s S w W | ||
| : [dDsSwW] | ||
| //one of d D s S w W v V h H |
There was a problem hiding this comment.
as v V h H are non-standard, and specific to Java, add a comment to explain them here
| // this limits the character class complements to 0xffff instead of allowing up to 0x10ffff, but values over | ||
| // 0xffff are not permitted on Char as they need 2 Chars to be represented; to allow this, we would need to | ||
| // use String or Int in every possible step as methods which return a single Char cannot return these characters | ||
| if(negated) internalRanges.add(CharacterRange(Character.MIN_VALUE,Character.MAX_VALUE)) |
There was a problem hiding this comment.
always use {} for blocks, even if single statement inside
| companion object { | ||
| private val log: Logger = LoggerFactory.getLogger(Randomness::class.java) | ||
|
|
||
| private fun stringToListOfCharPairs(s: String) : List<CharacterRange> { |
There was a problem hiding this comment.
rather stringToListOfCharacterRanges ?
| private val digitSet = listOf(CharacterRange('0', '9')) | ||
| private val asciiLetterSet = listOf(CharacterRange('a', 'z'), CharacterRange('A', 'Z')) | ||
| private val wordSet = listOf(CharacterRange('_', '_')) + asciiLetterSet + digitSet | ||
| private val spaceSet = stringToListOfCharPairs(" \t\r\n\u000C\u000b") |
There was a problem hiding this comment.
explain in a comment what \u000C\u000b are...
also notice that, in some cases, we explicitely skipped some special characters because escaping in the generating test files is bit buggy (need fixing... but never had time)
| private val verticalSpaceSet = stringToListOfCharPairs("\n\u000B\u000C\r\u0085\u2028\u2029") | ||
| private val punctuationSet = stringToListOfCharPairs("""!"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~""") | ||
|
|
||
| private val digitCharClass = CharacterRangeRxGene(false, digitSet) |
There was a problem hiding this comment.
should not have references to a gene (ie CharacterRangeRxGene) here in this package, as it can have architectural side-effects in the long run... whatever logic needs to be re-used, should be refactored and extracted outside of gene
There was a problem hiding this comment.
Before these changes there were no references to genes here, but character classes values were sampled from here. The possible values came from strings defined on this file, which provided some basic functionality but missed several characters from predefined character classes. Trying to get these predefined character classes working with these missing characters I implemented them by changing code here. I see now how this could be an issue.
So to fix this, the predefined character classes definitions should be in CharacterClassEscapeRxGene, then with each randomize instead of using Randomness.nextX() we just call CharacterRangeRxGene.randomize(...) by making each CharacterClassEscapeRxGene hold the CharacterRangeRxGene which represents it? Then restore the previous method of getting Randomness.nextWordChar(), etc. (regarding previous comment about skipping some special characters as escaping in the generating test files)
| return set[random.nextInt(set.length)] | ||
| } | ||
|
|
||
| fun nextFromCharClass(cc: CharacterRangeRxGene) : Char{ |
There was a problem hiding this comment.
maybe you could introduce a new MultiCharacterRange and move out some of business logic of CharacterRangeRxGene into it (which then would use MultiCharacterRange internally)
| cc.randomize(this, false) | ||
| val k = cc.value | ||
| // is it necessary to log this? | ||
| log.trace("nextFromCharClass(): {}", k) |
There was a problem hiding this comment.
yes... we have tests for determinism that check those trace logs. eg look at runAndCheckDeterminism()
…aracterClassEscapeRxGene.kt, restore old Randomness.kt
…acterRangeRxGene.kt to new class src/main/kotlin/org/evomaster/core/utils/MultiCharacterRange
Added Java/JS support for negated character classes
Added Java support for missing predefined character classes escapes (\v, \H, etc.) and POSIX character classes (\p{Lower}, etc.)