[Nvda-dev] commit r1954 - in trunk: . source

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[Nvda-dev] commit r1954 - in trunk: . source

NVDA Subversion
Author: jteh
Date: Wed Apr 23 23:56:11 2008
New Revision: 1954

Log:
speech.processTextSymbols: Combine all of the text symbol processing into one regular expression and a replacement function. This simplifies the code somewhat and reduces the number of passes through the string, making this more than 4.5 times faster. Some slight changes were made to the preservation of actual symbols.

Modified:
   trunk/   (props changed)
   trunk/source/speech.py

Modified: trunk/source/speech.py
==============================================================================
--- trunk/source/speech.py (original)
+++ trunk/source/speech.py Wed Apr 23 23:56:11 2008
@@ -33,13 +33,6 @@
 speechMode_beeps_ms=15
 beenCanceled=True
 isPaused=False
-re_sentence_dot=re.compile(r"(\w|\)|\"|')\.(\s|$)")
-re_sentence_comma=re.compile(r"(\w|\)|\"|'),(\s|$)")
-re_sentence_question=re.compile(r"(\w|\))\?(\s|$)")
-re_sentence_colon=re.compile(r"(\w|\)|\"|'):(\s|$)")
-re_sentence_semiColon=re.compile(r"(\w|\)|\"|');(\s|$)")
-re_sentence_exclimation=re.compile(r"(\w|\)|\"|')!(\s|$)")
-re_word_apostraphy=re.compile(r"(\w)'(\w)")
 typedWord=""
 REASON_FOCUS=1
 REASON_MOUSE=2
@@ -62,6 +55,26 @@
 def terminate():
  setSynth(None)
 
+RE_PROCESS_SYMBOLS = re.compile(
+ # Groups 1-3: expand symbols where the actual symbol should be preserved to provide correct entonation.
+ # Group 1: sentence endings.
+ r"(?:(?<=[^\s.!?])([.!?])(?=[\"')\s]|$))"
+ # Group 2: comma.
+ + r"|(,)"
+ # Group 3: semi-colon and colon.
+ + r"|(?:(?<=[^\s;:])([;:])(?=\s|$))"
+ # Group 4: expand all other symbols without preserving.
+ + r"|([%s])" % re.escape("".join(frozenset(characterSymbols.names) - frozenset(characterSymbols.blankList)))
+)
+def _processSymbol(m):
+ symbol = m.group(1) or m.group(2) or m.group(3)
+ if symbol:
+ # Preserve symbol.
+ return " %s%s " % (characterSymbols.names[symbol], symbol)
+ else:
+ # Expand without preserving.
+ return " %s " % characterSymbols.names[m.group(4)]
+
 def processTextSymbols(text,expandPunctuation=False):
  if (text is None) or (len(text)==0) or (isinstance(text,basestring) and (set(text)<=set(characterSymbols.blankList))):
  return _("blank")
@@ -69,47 +82,9 @@
  if isinstance(text,basestring):
  text=text.replace(u'\xa0',u' ')
  text = speechDictHandler.processText(text)
- #expands ^ and ~ so they can be used as protector symbols
- #Expands special sentence punctuation keeping the origional physical symbol but protected by ^ and ~
- #Expands any other symbols and removes ^ and ~ protectors
- if expandPunctuation is False:
+ if not expandPunctuation:
  return text
- protector=False
- buf=""
- for char in text:
- if (char=="^") or (char=="~"):
- buf+=" %s "%characterSymbols.names[char]
- else:
- buf+=char
- text=buf
- text=re_sentence_dot.sub(r"\1 ^%s.~ \2"%characterSymbols.names["."],text)
- text=re_sentence_comma.sub(r"\1 ^%s,~ \2"%characterSymbols.names[","],text)
- text=re_sentence_question.sub(r"\1 ^%s?~ \2"%characterSymbols.names["?"],text)
- text=re_sentence_colon.sub(r"\1 ^%s:~ \2"%characterSymbols.names[":"],text)
- text=re_sentence_semiColon.sub(r"\1 ^%s;~ \2"%characterSymbols.names[";"],text)
- text=re_sentence_exclimation.sub(r"\1 ^%s!~ \2"%characterSymbols.names["!"],text)
- #text=re_word_apostraphy.sub(r"\1 %s^.~ \2"%characterSymbols.names["'"],text)
- buf=""
- for char in text:
- if char=="^":
- protector=True
- buf+="^"
- continue
- if char=="~":
- protector=False
- buf+="~"
- continue
- if not protector:
- if (char not in characterSymbols.blankList) and char in characterSymbols.names:
- buf+=" ^%s~ "%characterSymbols.names[char]
- else:
- buf+=char
- else:
- buf+=char
- text=buf
- text=text.replace("^","")
- text=text.replace("~","")
- return text
+ return RE_PROCESS_SYMBOLS.sub(_processSymbol, text)
 
 def processSymbol(symbol):
  if isinstance(symbol,basestring):


Reply | Threaded
Open this post in threaded view
|

Re: [Nvda-dev] commit r1954 - in trunk: . source

Lubos Pintes-2
Hi Jamie,
Perhaps it would be better to write this regex as shown in "Dive into
python". Now it is really terrible :-).

NVDA Subversion  wrote / napísal(a):

> Author: jteh
> Date: Wed Apr 23 23:56:11 2008
> New Revision: 1954
>
> Log:
> speech.processTextSymbols: Combine all of the text symbol processing into one regular expression and a replacement function. This simplifies the code somewhat and reduces the number of passes through the string, making this more than 4.5 times faster. Some slight changes were made to the preservation of actual symbols.
>
> Modified:
>    trunk/   (props changed)
>    trunk/source/speech.py
>
> Modified: trunk/source/speech.py
> ==============================================================================
> --- trunk/source/speech.py (original)
> +++ trunk/source/speech.py Wed Apr 23 23:56:11 2008
> @@ -33,13 +33,6 @@
>  speechMode_beeps_ms=15
>  beenCanceled=True
>  isPaused=False
> -re_sentence_dot=re.compile(r"(\w|\)|\"|')\.(\s|$)")
> -re_sentence_comma=re.compile(r"(\w|\)|\"|'),(\s|$)")
> -re_sentence_question=re.compile(r"(\w|\))\?(\s|$)")
> -re_sentence_colon=re.compile(r"(\w|\)|\"|'):(\s|$)")
> -re_sentence_semiColon=re.compile(r"(\w|\)|\"|');(\s|$)")
> -re_sentence_exclimation=re.compile(r"(\w|\)|\"|')!(\s|$)")
> -re_word_apostraphy=re.compile(r"(\w)'(\w)")
>  typedWord=""
>  REASON_FOCUS=1
>  REASON_MOUSE=2
> @@ -62,6 +55,26 @@
>  def terminate():
>   setSynth(None)
>  
> +RE_PROCESS_SYMBOLS = re.compile(
> + # Groups 1-3: expand symbols where the actual symbol should be preserved to provide correct entonation.
> + # Group 1: sentence endings.
> + r"(?:(?<=[^\s.!?])([.!?])(?=[\"')\s]|$))"
> + # Group 2: comma.
> + + r"|(,)"
> + # Group 3: semi-colon and colon.
> + + r"|(?:(?<=[^\s;:])([;:])(?=\s|$))"
> + # Group 4: expand all other symbols without preserving.
> + + r"|([%s])" % re.escape("".join(frozenset(characterSymbols.names) - frozenset(characterSymbols.blankList)))
> +)
> +def _processSymbol(m):
> + symbol = m.group(1) or m.group(2) or m.group(3)
> + if symbol:
> + # Preserve symbol.
> + return " %s%s " % (characterSymbols.names[symbol], symbol)
> + else:
> + # Expand without preserving.
> + return " %s " % characterSymbols.names[m.group(4)]
> +
>  def processTextSymbols(text,expandPunctuation=False):
>   if (text is None) or (len(text)==0) or (isinstance(text,basestring) and (set(text)<=set(characterSymbols.blankList))):
>   return _("blank")
> @@ -69,47 +82,9 @@
>   if isinstance(text,basestring):
>   text=text.replace(u'\xa0',u' ')
>   text = speechDictHandler.processText(text)
> - #expands ^ and ~ so they can be used as protector symbols
> - #Expands special sentence punctuation keeping the origional physical symbol but protected by ^ and ~
> - #Expands any other symbols and removes ^ and ~ protectors
> - if expandPunctuation is False:
> + if not expandPunctuation:
>   return text
> - protector=False
> - buf=""
> - for char in text:
> - if (char=="^") or (char=="~"):
> - buf+=" %s "%characterSymbols.names[char]
> - else:
> - buf+=char
> - text=buf
> - text=re_sentence_dot.sub(r"\1 ^%s.~ \2"%characterSymbols.names["."],text)
> - text=re_sentence_comma.sub(r"\1 ^%s,~ \2"%characterSymbols.names[","],text)
> - text=re_sentence_question.sub(r"\1 ^%s?~ \2"%characterSymbols.names["?"],text)
> - text=re_sentence_colon.sub(r"\1 ^%s:~ \2"%characterSymbols.names[":"],text)
> - text=re_sentence_semiColon.sub(r"\1 ^%s;~ \2"%characterSymbols.names[";"],text)
> - text=re_sentence_exclimation.sub(r"\1 ^%s!~ \2"%characterSymbols.names["!"],text)
> - #text=re_word_apostraphy.sub(r"\1 %s^.~ \2"%characterSymbols.names["'"],text)
> - buf=""
> - for char in text:
> - if char=="^":
> - protector=True
> - buf+="^"
> - continue
> - if char=="~":
> - protector=False
> - buf+="~"
> - continue
> - if not protector:
> - if (char not in characterSymbols.blankList) and char in characterSymbols.names:
> - buf+=" ^%s~ "%characterSymbols.names[char]
> - else:
> - buf+=char
> - else:
> - buf+=char
> - text=buf
> - text=text.replace("^","")
> - text=text.replace("~","")
> - return text
> + return RE_PROCESS_SYMBOLS.sub(_processSymbol, text)
>  
>  def processSymbol(symbol):
>   if isinstance(symbol,basestring):
>
> _______________________________________________
> Nvda-dev mailing list
> [hidden email]
> http://lists.nvaccess.org/listinfo/nvda-dev
>


Reply | Threaded
Open this post in threaded view
|

Re: [Nvda-dev] commit r1954 - in trunk: . source

James Teh-2
Lubos Pintes wrote:
> Perhaps it would be better to write this regex as shown in "Dive into
> python". Now it is really terrible :-).
Explain.

Jamie

--
James Teh
Email: [hidden email]
WWW: http://www.jantrid.net/
MSN Messenger: [hidden email]
Jabber: [hidden email]
Yahoo: jcs_teh


Reply | Threaded
Open this post in threaded view
|

Re: [Nvda-dev] commit r1954 - in trunk: . source

Lubos Pintes-2
Ok, here, regex for roman numerals:
 >>> pattern = """
     ^                   # beginning of string
     M{0,4}              # thousands - 0 to 4 M's
     (CM|CD|D?C{0,3})    # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3
C's),
                         #            or 500-800 (D, followed by 0 to 3 C's)
     (XC|XL|L?X{0,3})    # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),
                         #        or 50-80 (L, followed by 0 to 3 X's)
     (IX|IV|V?I{0,3})    # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's),
                         #        or 5-8 (V, followed by 0 to 3 I's)
     $                   # end of string
     """


James Teh  wrote / napísal(a):
> Lubos Pintes wrote:
>> Perhaps it would be better to write this regex as shown in "Dive into
>> python". Now it is really terrible :-).
> Explain.
>
> Jamie
>


Reply | Threaded
Open this post in threaded view
|

Re: [Nvda-dev] commit r1954 - in trunk: . source

James Teh-2
Lubos Pintes wrote:
> Ok, here, regex for roman numerals:
...
You can definitely insert comments randomly into a regex like that? I
would have thought they'd be included as part of the match.
The regex match used for processing symbols is much more complicated
than that, though. I tried thinking of a good way to comment it, but I
was basically just expressing the regex again in textual form, which is
pretty pointless. I split it up into separate chunks, so it should be
possible to follow each one separately.

Jamie

--
James Teh
Email: [hidden email]
WWW: http://www.jantrid.net/
MSN Messenger: [hidden email]
Jabber: [hidden email]
Yahoo: jcs_teh


Reply | Threaded
Open this post in threaded view
|

Re: [Nvda-dev] commit r1954 - in trunk: . source

Lubos Pintes-2
Och I forgot one important thing , sorry:
The match is as follows:
 >>> re.search(pattern, '<string>', re.VERBOSE)
In multiline string, whitespaces and comments are ignored. If we want to
match whitespaces, we must escape it with backslash.
       
James Teh  wrote / napísal(a):

> Lubos Pintes wrote:
>> Ok, here, regex for roman numerals:
> ...
> You can definitely insert comments randomly into a regex like that? I
> would have thought they'd be included as part of the match.
> The regex match used for processing symbols is much more complicated
> than that, though. I tried thinking of a good way to comment it, but I
> was basically just expressing the regex again in textual form, which is
> pretty pointless. I split it up into separate chunks, so it should be
> possible to follow each one separately.
>
> Jamie
>