Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC - Unicode chars in identifiers #317

Merged
merged 5 commits into from
Dec 20, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/parser.ts
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ export const Import: Parser<ImportNode> = node(ImportNode)(() =>
// COMMON
// ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────

export const name: Parser<Name> = lazy('identifier', () => regex(/[^\W\d]\w*/))
export const name: Parser<Name> = lazy('identifier', () => regex(/^[\p{L}_][\p{L}\p{N}_]*/u))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh aqui la magic, ese \p{L} hace referencia a cualquier unicode letter, que contiene a todos los \w (menos al _ creo)


export const packageName: Parser<Name> = lazy('package identifier', () => regex(/[^\W\d][\w-]*/))

Expand Down
23 changes: 23 additions & 0 deletions test/parser.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -311,6 +311,10 @@ describe('Wollok parser', () => {
'_foo123'.should.be.be.parsedBy(parser).into('_foo123')
})

it('should parse names that contains unicode chars', () => {
'_foö123_and_bár'.should.be.be.parsedBy(parser).into('_foö123_and_bár')
})

it('should not parse names with spaces', () => {
'foo bar'.should.not.be.parsedBy(parser)
})
Expand All @@ -327,6 +331,9 @@ describe('Wollok parser', () => {
'"foo"'.should.not.be.parsedBy(parser)
})

it('should not parse strings containing unicode as names', () => {
'"foö"'.should.not.be.parsedBy(parser)
fdodino marked this conversation as resolved.
Show resolved Hide resolved
})
})


Expand Down Expand Up @@ -1871,6 +1878,10 @@ class c {}`
'var v'.should.be.parsedBy(parser).into(new Variable({ name: 'v', isConstant: false })).and.be.tracedTo(0, 5)
})

it('should parse var declaration with non-ascii caracter in identifier', () => {
'var ñ'.should.be.parsedBy(parser).into(new Variable({ name: 'ñ', isConstant: false })).and.be.tracedTo(0, 5)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 para poder escribir codigo 100% en castellano

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uh qué bueno!!

quiero ver si pasa también por el tamiz del validador, el cli y el LSP-IDE, pero definitivamente lo quiero!! 👏🏼 👏🏼 👏🏼

})

it('should parse var asignation', () => {
'var v = 5'.should.be.parsedBy(parser).into(
new Variable({
Expand Down Expand Up @@ -2197,6 +2208,18 @@ class c {}`
)
})

it('should parse references starting with unicode letter', () => {
'ñ'.should.be.parsedBy(parser).into(new Reference({ name: 'ñ' })).and.be.tracedTo(0, 1)
})

it('should parse references containing unicode letter', () => {
'some_ñandu'.should.be.parsedBy(parser).into(new Reference({ name: 'some_ñandu' })).and.be.tracedTo(0, 10)
})

it('should not parse references starting with numbers that contain unicode letters', () => {
'4ñandu'.should.not.be.parsedBy(parser)
})

it('should not parse references with spaces', () => {
'foo bar'.should.not.be.parsedBy(parser)
})
Expand Down
Loading