[NVDA-dev] OCR Plugin - More Config Options And Image File Input?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[NVDA-dev] OCR Plugin - More Config Options And Image File Input?

Kerneels Roos
Hi.

Not sure where to post comments on the OCR plugin since it's ticket has
been closed, so here goes.

It works quite well, thanks. What would be a nice addition is to have
more config options available in the menu > OCR Settings dialog to fine
tune the engine.

Another feature which would be very useful is a way to send an image
file to the OCR engine instead of the screen rectangle of the current
navigator object? Perhaps when pressing NVDA Key+R while in Windows
Explorere, while on an image file?

A nice scenario for this would be where you get a screenshot from
someone else. Opening the image inside an image viewer is problematic
(for OCR purposes) since, in order for the OCR to work properly one has
to zoom the image a bit, but not too much, but then you only see part of
the image etc.

Thanks

--
Kerneels Roos
Cell: +27 (0)79 696 6038
Tel: +27 (0)811 2108
LinkedIn: http://za.linkedin.com/in/kerneels

Skype: cornelis.roos



Reply | Threaded
Open this post in threaded view
|

Re: [NVDA-dev] OCR Plugin - More Config Options And Image File Input?

Brian Gaff Lineone downstairs
Do you find the ocr is at all accurate for annything other than screen
shots? I don't.
Maybe this will make its performance in that way seem poor but it is a free
engine and I'd guess the  commercial ones have a lot more finesse than this
one has.

Brian

[hidden email]
Brian Gaff's other account.

----- Original Message -----
From: "Kerneels Roos" <[hidden email]>
To: <[hidden email]>
Sent: Monday, October 15, 2012 1:24 PM
Subject: [NVDA-dev] OCR Plugin - More Config Options And Image File Input?


> Hi.
>
> Not sure where to post comments on the OCR plugin since it's ticket has
> been closed, so here goes.
>
> It works quite well, thanks. What would be a nice addition is to have more
> config options available in the menu > OCR Settings dialog to fine tune
> the engine.
>
> Another feature which would be very useful is a way to send an image file
> to the OCR engine instead of the screen rectangle of the current navigator
> object? Perhaps when pressing NVDA Key+R while in Windows Explorere, while
> on an image file?
>
> A nice scenario for this would be where you get a screenshot from someone
> else. Opening the image inside an image viewer is problematic (for OCR
> purposes) since, in order for the OCR to work properly one has to zoom the
> image a bit, but not too much, but then you only see part of the image
> etc.
>
> Thanks
>
> --
> Kerneels Roos
> Cell: +27 (0)79 696 6038
> Tel: +27 (0)811 2108
> LinkedIn: http://za.linkedin.com/in/kerneels
>
> Skype: cornelis.roos
>
>
> __________________________________________
> A discussion list about the development of NVDA, the free and open source
> screen reader for Windows.
> [hidden email] (http://lists.nvaccess.org/listinfo/nvda-dev)
> Note this list is for discussion of NVDA's general development. Discussion
> about translating NVDA to various languages happens on a separate email
> list. For more information about NVDA localization and translation, please
> visit:
> http://www.nvda-project.org/wiki/TranslatingNVDA
> For an overview of NVDA development, please visit:
> http://www.nvda-project.org/wiki/Development
> To be notified of source code changes and trac ticket updates, please
> subscribe to the nvda-commits list:
> http://lists.nvaccess.org/listinfo/nvda-commits 



Reply | Threaded
Open this post in threaded view
|

Re: [NVDA-dev] OCR Plugin - More Config Options And Image File Input?

Rui Batista
In reply to this post by Kerneels Roos
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,
Em 15-10-2012 13:24, Kerneels Roos escreveu:
> Hi.
>
> Not sure where to post comments on the OCR plugin since it's ticket
> has been closed, so here goes.
>

- From what I recall, there is a component to assign bugs for the OCR
plugin on the NVDA website.


> It works quite well, thanks. What would be a nice addition is to
> have more config options available in the menu > OCR Settings
> dialog to fine tune the engine.

Can you give some examples of what parameters are useful for the user
to configure?

>
> Another feature which would be very useful is a way to send an
> image file to the OCR engine instead of the screen rectangle of the
> current navigator object? Perhaps when pressing NVDA Key+R while in
> Windows Explorere, while on an image file?

I understand the usefulness of this idea, however I don't think that
is a screen reader sort of feature neither if it is possible or
diserable to implement in NVDA, the way you sugest. At least as an
external plugin. Obviously having a file-chooser dialog from where to
choose a file to OCE isn't hard to do, the question is more if we want
to do it or not.

>
> A nice scenario for this would be where you get a screenshot from
> someone else. Opening the image inside an image viewer is
> problematic (for OCR purposes) since, in order for the OCR to work
> properly one has to zoom the image a bit, but not too much, but
> then you only see part of the image etc.
>

That is an interesting use case.

Regards,

Rui Batista
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQfG2pAAoJEImv//YyMgU5xRMIALhu4YNvby/drNKAL9na1cI4
M7FCQ63gENHFDVzJjb/LgQ6idoIJ4+oMxEJDNV2sWZRK4eiEdzeF+hNFFuPvTmoL
MIShv1w2aYaby5q/2SfzkAeyqQEftNyRL8lAne97uLkyhWIpI5AzaXaNwpVSCQbv
tsTCe5lMLr0VuV2bPMi0VJnGawbPnD6Gs+uEcPu6QjG5xam2R+/JV2unPyBGRuw0
uKUP+NKgYSuKGHkKcB3MD/RJ1JkloBLc8OcRESV/XdNXTo4mOTHLf6SaS77k7/Ll
4vI1QTOA75MI20AUILwgi82JiqK/EdDUtXAO/hKbFibH2lpKO82eC6qrS1HXTlg=
=jJ9T
-----END PGP SIGNATURE-----


Reply | Threaded
Open this post in threaded view
|

Re: [NVDA-dev] OCR Plugin - More Config Options And Image File Input?

James Teh
In reply to this post by Kerneels Roos
Hi.

On 15/10/2012 10:24 PM, Kerneels Roos wrote:
> It works quite well, thanks. What would be a nice addition is to have
> more config options available in the menu > OCR Settings dialog to fine
> tune the engine.
There isn't really anything to fine tune for this engine.

> Another feature which would be very useful is a way to send an image
> file to the OCR engine instead of the screen rectangle of the current
> navigator object?
This is out of scope for screen reader OCR functionality. If you want to
oCR an image file, download an OCR engine (Tesseract is what NVDA uses)
and run it on the file.

Jamie

--
James Teh
Director, NV Access Limited
Email: [hidden email]
Web site: http://www.nvaccess.org/
Phone: +61 7 5667 8372


Reply | Threaded
Open this post in threaded view
|

Re: [NVDA-dev] OCR Plugin - More Config Options And Image File Input?

mk360
     Hi...
El 15-10-2012 19:27, James Teh escribió:
> This is out of scope for screen reader OCR functionality. If you want
> to oCR an image file, download an OCR engine (Tesseract is what NVDA
> uses) and run it on the file.
>
     Well, the OCR plugin is not the screen reader, so if someone add
these functions is not a problem, you can use that or not. Note that I
don't need more functions, but the purpose of addonds in programs is to
expand its functionality.

     Regards,
     mk.


Reply | Threaded
Open this post in threaded view
|

Re: [NVDA-dev] OCR Plugin - More Config Options And Image File Input?

Kerneels Roos
In reply to this post by Brian Gaff Lineone downstairs

On 2012/10/15 06:40 PM, Brian Gaff Lineone downstairs wrote:
> Do you find the ocr is at all accurate for annything other than screen
> shots? I don't.
> Maybe this will make its performance in that way seem poor but it is a
> free engine and I'd guess the  commercial ones have a lot more finesse
> than this one has.
>
I'm sure the commercial ones are better, but Tesseract is actually a
very sophisticated OCR engine that was in the top 3 in the world in
1995. I'm sure it was developed mainly for the OCR of simple blocks of
scanned text, such as from a book, so it's Page Layout Analysis is
probably quite simplistic compared to the latest engines.

Brian, I've had reasonable success with it through the plugin - I could
successfully read the text of my laptop's manual, which is a scanned
PDF. Yes, it's not great, but at least I could get the gist of what the
text was saying.

> Brian
>
> [hidden email]
> Brian Gaff's other account.
>
> ----- Original Message ----- From: "Kerneels Roos" <[hidden email]>
> To: <[hidden email]>
> Sent: Monday, October 15, 2012 1:24 PM
> Subject: [NVDA-dev] OCR Plugin - More Config Options And Image File
> Input?
>
>
>> Hi.
>>
>> Not sure where to post comments on the OCR plugin since it's ticket
>> has been closed, so here goes.
>>
>> It works quite well, thanks. What would be a nice addition is to have
>> more config options available in the menu > OCR Settings dialog to
>> fine tune the engine.
>>
>> Another feature which would be very useful is a way to send an image
>> file to the OCR engine instead of the screen rectangle of the current
>> navigator object? Perhaps when pressing NVDA Key+R while in Windows
>> Explorere, while on an image file?
>>
>> A nice scenario for this would be where you get a screenshot from
>> someone else. Opening the image inside an image viewer is problematic
>> (for OCR purposes) since, in order for the OCR to work properly one
>> has to zoom the image a bit, but not too much, but then you only see
>> part of the image etc.
>>
>> Thanks
>>
>> --
>> Kerneels Roos
>> Cell: +27 (0)79 696 6038
>> Tel: +27 (0)811 2108
>> LinkedIn: http://za.linkedin.com/in/kerneels
>>
>> Skype: cornelis.roos
>>
>>
>> __________________________________________
>> A discussion list about the development of NVDA, the free and open
>> source screen reader for Windows.
>> [hidden email]
>> (http://lists.nvaccess.org/listinfo/nvda-dev)
>> Note this list is for discussion of NVDA's general development.
>> Discussion about translating NVDA to various languages happens on a
>> separate email list. For more information about NVDA localization and
>> translation, please visit:
>> http://www.nvda-project.org/wiki/TranslatingNVDA
>> For an overview of NVDA development, please visit:
>> http://www.nvda-project.org/wiki/Development
>> To be notified of source code changes and trac ticket updates, please
>> subscribe to the nvda-commits list:
>> http://lists.nvaccess.org/listinfo/nvda-commits 
>
>
> __________________________________________
> A discussion list about the development of NVDA, the free and open
> source screen reader for Windows.
> [hidden email] (http://lists.nvaccess.org/listinfo/nvda-dev)
> Note this list is for discussion of NVDA's general development.
> Discussion about translating NVDA to various languages happens on a
> separate email list. For more information about NVDA localization and
> translation, please visit:
> http://www.nvda-project.org/wiki/TranslatingNVDA
> For an overview of NVDA development, please visit:
> http://www.nvda-project.org/wiki/Development
> To be notified of source code changes and trac ticket updates, please
> subscribe to the nvda-commits list:
> http://lists.nvaccess.org/listinfo/nvda-commits

--
Kerneels Roos
Cell: +27 (0)79 696 6038
Tel: +27 (0)811 2108
LinkedIn: http://za.linkedin.com/in/kerneels

Skype: cornelis.roos



Reply | Threaded
Open this post in threaded view
|

Re: [NVDA-dev] OCR Plugin - More Config Options And Image File Input?

Kerneels Roos
In reply to this post by Rui Batista

On 2012/10/15 10:10 PM, Rui Batista wrote:
> Not sure where to post comments on the OCR plugin since it's ticket
> has been closed, so here goes.
> - From what I recall, there is a component to assign bugs for the OCR
> plugin on the NVDA website.
Sorry, can't fine it.

>> It works quite well, thanks. What would be a nice addition is to
>> have more config options available in the menu > OCR Settings
>> dialog to fine tune the engine.
> Can you give some examples of what parameters are useful for the user
> to configure?
Well, in the track ticket there was mention of things like B&W or
grayscale, and I'm sure - without having looked into it too much - that
tesseract would have more setup params, command line input params,
config files and so on surely?

Just thought that if there were static settings that gets applied by
default, could they be moved into the OCR Settings dialog perhaps?

>> Another feature which would be very useful is a way to send an
>> image file to the OCR engine instead of the screen rectangle of the
>> current navigator object? Perhaps when pressing NVDA Key+R while in
>> Windows Explorere, while on an image file?
> I understand the usefulness of this idea, however I don't think that
> is a screen reader sort of feature neither if it is possible or
> diserable to implement in NVDA, the way you sugest. At least as an
> external plugin. Obviously having a file-chooser dialog from where to
> choose a file to OCR isn't hard to do, the question is more if we want
> to do it or not.
Yes, obviously a file chooser scenario is simple to implement, but what
I was thinking was to streamline the process by having it work in one or
all of these ways:
* If NVDA can detect that you are trying to OCR the screen
representation of an actual image file, then the plugin could ask you if
you would not rather want to OCR the file itself rather. If you answer
yes then NVDA grabs the file and passes it on to the plugin which then
OCR the file and allows for immediate review of the contents. The great
feature of the OCR plugin is that you can immediately review the
resulting text - it is streamlined. You don't have to go pass a file to
another program, then open the resulting file somewhere, and only then
you are able to start reviewing the contents.
* Allow you to OCR a file from inside Windows Explorer by pressing NVDA
Key + R while on an image file, and then NVDA opens some type of textual
display of the result which you can review.

>> A nice scenario for this would be where you get a screenshot from
>> someone else. Opening the image inside an image viewer is
>> problematic (for OCR purposes) since, in order for the OCR to work
>> properly one has to zoom the image a bit, but not too much, but
>> then you only see part of the image etc.
>>
> That is an interesting use case.
Yes, I thought so too when I first discovered it. Solving the problem of
making screen shots accessible could assist blind programmers and tech
support when answering queries from sighted people that do not explain
screen shots sufficiently.

> Regards,
>
> Rui Batista
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://www.enigmail.net/
>
> iQEcBAEBAgAGBQJQfG2pAAoJEImv//YyMgU5xRMIALhu4YNvby/drNKAL9na1cI4
> M7FCQ63gENHFDVzJjb/LgQ6idoIJ4+oMxEJDNV2sWZRK4eiEdzeF+hNFFuPvTmoL
> MIShv1w2aYaby5q/2SfzkAeyqQEftNyRL8lAne97uLkyhWIpI5AzaXaNwpVSCQbv
> tsTCe5lMLr0VuV2bPMi0VJnGawbPnD6Gs+uEcPu6QjG5xam2R+/JV2unPyBGRuw0
> uKUP+NKgYSuKGHkKcB3MD/RJ1JkloBLc8OcRESV/XdNXTo4mOTHLf6SaS77k7/Ll
> 4vI1QTOA75MI20AUILwgi82JiqK/EdDUtXAO/hKbFibH2lpKO82eC6qrS1HXTlg=
> =jJ9T
> -----END PGP SIGNATURE-----
>
> __________________________________________
> A discussion list about the development of NVDA, the free and open source screen reader for Windows.
> [hidden email] (http://lists.nvaccess.org/listinfo/nvda-dev)
> Note this list is for discussion of NVDA's general development. Discussion about translating NVDA to various languages happens on a separate email list. For more information about NVDA localization and translation, please visit:
> http://www.nvda-project.org/wiki/TranslatingNVDA
> For an overview of NVDA development, please visit:
> http://www.nvda-project.org/wiki/Development
> To be notified of source code changes and trac ticket updates, please subscribe to the nvda-commits list:
> http://lists.nvaccess.org/listinfo/nvda-commits

--
Kerneels Roos
Cell: +27 (0)79 696 6038
Tel: +27 (0)811 2108
LinkedIn: http://za.linkedin.com/in/kerneels

Skype: cornelis.roos



Reply | Threaded
Open this post in threaded view
|

Re: [NVDA-dev] OCR Plugin - More Config Options And Image File Input?

Kerneels Roos
In reply to this post by James Teh
Hi.
On 2012/10/16 12:27 AM, James Teh wrote:
> Hi.
>
> On 15/10/2012 10:24 PM, Kerneels Roos wrote:
>> It works quite well, thanks. What would be a nice addition is to have
>> more config options available in the menu > OCR Settings dialog to fine
>> tune the engine.
> There isn't really anything to fine tune for this engine.

See an earlier email I wrote. If there really isn't anything useful to
fine tune then perhaps another engine should be considered? Someone
recommended asking Microsoft if we could use their engine. Since the OCR
feature is a plugin it could even be tied with a commercial engine -
i.e. I assume that a different licensing policy could apply here? A
small fee covering the cost of the commercial engine and a bit more
could be charged as another way to bring in some funds for NVAccess?


>
>> Another feature which would be very useful is a way to send an image
>> file to the OCR engine instead of the screen rectangle of the current
>> navigator object?
> This is out of scope for screen reader OCR functionality. If you want
> to oCR an image file, download an OCR engine (Tesseract is what NVDA
> uses) and run it on the file.
>
I can get your reasoning, yet, as a further email pointed out, it is
part of a plugin which purpose is to extend functionality etc., but I
was thinking about it and realised that OCR is something really
important that a screen reader should really be doing very well and
streamlined. It seems to me like only once JAWS 13 came out with the OCR
feature, only then did we realise that OCR is very useful and nice to
have handy in a screen reader.

With every new UI technology there is going to be a period of time where
we will have to come to grips with the new way in wich the UI tech
designers chose to implement accessibility. During that time, proper OCR
could render brand new applications semi-accessible right away.

Going on from there, if we can figure out how humans perceive the
computer screen and how they apply their intuition when faced with a new
application, we could develop functionality that will look at a screen
and explain what is going on to a blind person. Something like: "I see
five apps open, with the topmost being Thunderbird. Top most application
is devided into 6 sections: standard title bar, menu bar, tool bar, on
the left, a tree of some sort, on the right and larger than the left, a
tree fof some sort with multiple columns per node, and finally a status
bar at the bottom.".



Kerneels



.
> Jamie
>

--
Kerneels Roos
Cell: +27 (0)79 696 6038
Tel: +27 (0)811 2108
LinkedIn: http://za.linkedin.com/in/kerneels

Skype: cornelis.roos