Researching on the Internet: Finding
& Evaluating Information
Remote Control
Ernest Ackermann, Department
of Computer Science, Mary Washington
College
Karen
Hartman, Simpson Library,
Mary Washington College
Topics:
Critical thinking skills have
always been important to the process of searching for and using information
from media such as books, journal articles, radio broadcasts, television reports,
and so forth.
With the advent of the Internet
and World Wide Web, these skills have become even more crucial. Traditional
books and journal articles need to pass some kind of editorial scrutiny before
being published. Web pages, however, can appear without a single person ever
reading them through to check for accuracy. Libraries have collection development
policies that govern what material they will and will not buy; the Internet
and Web, having no such policies, collect everything. This isn't to say that
there isn't quality on the Internet. There are thousands of high caliber Web
pages and well-regarded databases. In order to find these quality resources,
we must make it our responsibility to
-
evaluate our information needs
-
choose an appropriate search
tool
-
formulate a search expression
that will select the most relevant resources
-
decide, using well-established
guidelines, whether the Web page or Internet resource is worth using in
our research paper or project
The First Step: Evaluate Your Information
Needs
Before you get online and
start your search for information, think about what types of material you're
looking for. Are you interested in finding facts to support an argument,
authoritative opinions, statistics, evaluative reports, descriptions of
events, images, or movie reviews? Do you need current information or facts
about an event that occurred 20 years ago? When are you sure the Web is
a smart place to start? A reference book in your library may have the information
you need and you'll find it more quickly. It may seem that the Web would
contain all the information that you require but this is not always the
case.
Types of Information Most Likely Found
on the Internet and World Wide Web
- Current information. Many
newspapers and popular magazines provide Web versions of their publications
and news updates throughout the day. Current financial and weather information
is also easily accessible. For an example, see TotalNEWS,
http://www.totalnews.com for links to dozens
of news sources.
- U.S. government information.
Most federal, state, and local government agencies provide statistics and
other information freely and in a timely manner. The University of Michigan's Document Center, http://henry.ugl.lib.umich.edu/libhome/Documents.center,
is an excellent starting point for government documents.
- Popular culture. It's easy
to find information on the latest movie or best-selling book. Try Amazon.com,
at http://www.amazon.com
for current book reviews.
- Full-text versions of books and other materials
that are not under copyright restriction. For example, there
are Shakespeare's plays, the Bible, the Canterbury Tales, and hundreds of
other full text literary resources available. The Internet
Public Library's Online Text Collection
has collected most of them at http://www.ipl.org/reading/books
- Business and company information.
Many companies not only provide their Web pages and annual reports, there
are also several databases that provide in-depth financial and other information
about companies. Companies
Online, at http://www.companiesonline.com
provides a database of public and private company information.
- Consumer information. The
Internet is a virtual gold mine of information for people who are interested
in buying a particular item and want opinions from people about the item.
Try searching Deja.com
- http://www.deja.com,
the next time you want to get opinions about that new automobile or vacuum
cleaner you are thinking of buying. Consumer health care information is
available at healthfinder - http://www.healthfinder.gov.
Other places for consumer information are Consumer
Information Center - http://www.pueblo.gsa.gov/,
and Consumer World - http://www.consumerworld.org/
- Medical information. In addition
to several excellent sources of medical information provided by hospitals,
pharmaceutical companies, and non-profit organizations, The National Library
of Medicine has provided the MEDLINE database to the public for free since
late 1997. Check out
PubMed's MEDLINE, http://www.ncbi.nlm.nih.gov/PubMed,
for your medical research questions.
- Unique archival sites. For
example, the Library
of Congress' American Memory Collection, http://lcweb2.loc.gov/ammem/mdbquery.html,
provides full-text documents, musical recordings, photographs, maps, and
more about certain periods of American history.
Some Reasons Why the World Wide Web Won't Have Everything
You Are Looking For
- Publishing companies and authors
who make money by creating and providing information will choose to use
the traditional publishing marketplace and not make the information free
via the Internet.
- Scholars most often choose
to publish their research in reputable scholarly journals and university
presses rather than use the Web to distribute their research. Surely more
academic journals are becoming Web-based, but these journals cost as much
money as subscribing to the paper form.
- Several organizations and
institutions would like to publish valuable information on the Web but don't
because of a lack of staff or funding to allow it.
- The Web tends to include information
that is in demand to a large portion of the public. The Web can't be relied
upon consistently for historical information. For example, if you needed
today's weather data for Minneapolis, Minnesota, the Web will certainly
have it. But if you wanted Minneapolis climatic data for November of 1976,
you might not find it on the Web.
Information Sources Available
on the Web
| Directories
or Subject Catalogs |
|
| Virtual
Libraries |
|
| Specialized
Databases |
- Specialized databases
can be comprehensive collections of hyperlinks in a particular subject
area or self-contained indexes that are searchable and available on
the Web.
- The
Internet
Sleuth, http://www.isleuth.com, accesses
more than three thousand specialized databases and directories.
|
| Proprietary
or Commercial Databases |
- Proprietary or commercial
databases charge a subscription fee to use.
- Proprietary databases
have certain value-added features that databases in the public domain
do not have, for example, databases on FirstSearch, http://www.ref.oclc.org,
have links to library holdings information. This way you can find out
which libraries own the materials that are indexed.
- Proprietary databases
also allow you to download information easily. For instance,
Dow Jones Interactive, http://www.djinteractive.com,
includes financial information that is commonly free to the public,
but it charges for the use of its database because it has made it much
easier for the user to download the information to a spreadsheet program.
- Proprietary databases often index material that others
do not. The information is distinguished by its uniqueness, its historical
value, or its competitive value. For example, Dialog,
http://www.dialogweb.com includes difficult-to-find
private company financial information and Infotrac's
Searchbank http://library.iacnet.com
and Lexis-Nexis
Academic Universe, http://www.lexisnexis.com
contain the full-text of hundreds of journal articles.
- Proprietary database systems
are more responsible to their users. Because they cost money, they are
more apt to provide training and other user support, such as distributing
newsletters that update their services.
- There are also databases
on the Web that are free to the public but charge if you want the full
text of the articles indexed. The Electric Library, http://www.elibrary.com
and Northern
Light, http://www.northernlight.com,
are examples of this type of database.
|
| Search
Engines |
|
| Meta-search
Tools |
|
| Library
Catalogs on the Web |
|
| Email
Discussion Groups |
- Email discussion groups
are sometimes called interest groups, listserv, or mailing
lists. Internet users join, contribute to, and read messages to
the entire group through email. Several thousand different groups exist.
- Several services let you
search for discussion groups. One is Liszt, http://www.liszt.com
|
| Usenet
Newsgroups |
- Usenet newsgroups are
collections of group discussions, questions, answers, and other information
shared through the Internet. The messages are called articles and are
grouped into categories called newsgroups. The newsgroups number in
the thousands, with tens of thousands of articles posted daily.
- Many search engines include
the option of searching archives of Usenet articles, and some services
such as Deja News,
http://www.deja.com—keep large archives
of Usenet articles.
|
Learn the Features and Capabilities
of a Search Tool or Service
-
Get to know the features and
capabilities of the search tool you'll use.
-
Click on Help or Tips.
(Read it!)
-
See if there is a FAQ (Frequently
Asked Questions) -- Browse through it.
Features:
-
What type of Boolean expressions
does it support? (AND, OR, NOT, + -)
-
What about 'wild cards'? (What's
matched by comput* ?)
-
Does it support phrase searching?
-
Proximity? (Terms 'near' each
other.)
-
Field Searching? (title, URL,
domain, etc…)
-
Can you limit results by date
or domain?
-
Can you make choices about the
way results are reported?
-
How are results reported?
-
Is it possible to narrow or
revise a search?
-
Is help provided for forming
search expressions?
-
What's the coverage?
Common Search Features:
|
Boolean operators |
-
use AND to require that two terms be present, for example, global AND warming means that both global and warming be present
-
use OR to require that one or both of two terms be present, for example, global OR warming means that either global or warming, or both terms will be present
-
use NOT to require that a term not be present, for example, global NOT warming means that we will get results that include global but don't include the term warming.
|
| Implied
Boolean operators |
-
use + to require a term
be present, +term means term must be present
-
use - to exclude a term,
-term means term must not be present
|
| Phrases |
-
use two quotation marks to enclose
a phrase, terms must appear in the order given; for example "gibson acoustic
guitar"
|
| Truncation
or Wild Cards |
-
use * to represent different
endings for a word; for example comput* would be used to match terms computer,
computing, computers, computation
|
| Field Searching
|
-
Web pages can be broken down into many parts. These parts, or fields, include titles, URLs, text, summaries or annotations (if present), text, and so forth. Field searching is the ability to limit your search to certain fields. This ability to search by field can increase the relevance of the retrieved records. For example, to search the Web for an image of a comet, limit your search results to Web pages that contain images that have the word comet in their filenames.
|
| Limiting by Date
|
-
Some search engines allow you to search the Web for pages that were added to the database between certain dates. In limiting by date, you can find only the pages that were entered in the past month, in the past year, or in a particular year
|
Relevancy Ranking:
Most search engines measure each Web page's relevance to your search query and arrange the search results from the most relevant to the least relevant. This is called relevancy ranking. Each search engine has its own algorithm for determining relevance, but it usually involves counting how many times the words in your query appear in the Web pages. In some search engines, a document is considered more relevant if the words appear in certain fields, for example, the title or summary field. In other search engines, relevance is determined by the number of times the keyword appears in a Web page divided by the total number of words in the page. This gives a percentage, and the page with the largest percentage appears first on the list.
Basic Search Strategy: The
Ten Steps
The following list provides a guideline for you to follow
in formulating search requests, viewing search results, and modifying search
results. These procedures can be followed for virtually any search request,
from the simplest to the most complicated. For some search requests, you
may not want or need to go through a formal search strategy. If you want
to save time in the long run, however, it's a good idea to follow a strategy,
especially when you're new to a particular search engine. Taken from
"The Information Specialist's Guide to Searching and Researching on the
Internet & the World Wide Web." Ernest Ackermann & Karen Hartman.
ISBN 1-887902-31-7. Published by ABF Content . Take a look at Searching and Researching on
the Internet and the World Wide Web for more details about
searching and research.
A basic search strategy can help you get used to each
search engine's features and how they are expressed in the search query.
Following the 10 steps will also ensure good results if your search is
multifaceted and you want to get the most relevant results.
-
Identify the important concepts of your search.
-
Choose the keywords that describe these concepts.
-
Determine whether there are synonyms, related
terms, or other variations of the keywords that should be included.
-
Determine which search features may apply,
i.e., truncation, proximity operators, Boolean operators, etc.
-
Choose a search engine.
-
Read the search instructions on the search
engine’s home page. Look for sections entitled help, advanced search, frequently
asked questions, etc.
-
Create a search expression, using syntax,
which is appropriate for the search engine.
-
Evaluate the results. Were they
relevant to your query?
-
Modify your search if needed. Go back to steps
2-4 and revise your query accordingly.
-
Try the same search in a different search
engine, following steps 5-9 above.
Search Tips
For multi-faceted searches
a full-text database is best. For a search involving one facet like a person’s
name or a phrase without stop words, search engines that provide keyword
indexing will be sufficient.
After determining whether
your search has yielded too few Web pages (low recall), there
are several things to consider:
-
Perhaps the search expression
was too specific; go back and remove some terms that are connected by ANDs.
-
Perhaps there are more possible
terms to use. Think of more synonyms to OR together. Try truncating more
words if possible.
-
Check spelling and syntax (a
forgotten quotation mark or a missing parentheses)
-
Read the instructions on the
help pages again.
If your search has given you
too many results with many not on the point of your topic
(high recall, low precision), consider the following:
-
Narrow your search to specific
fields, if possible.
-
Use more specific terms; i.e.,
instead of sorting, use a specific type of sorting algorithm.
-
Add additional terms with AND
or NOT.
-
Remove some synonyms if possible.
A search example using the 10 steps
Tutorials and Other Information
on the World Wide
Evaluating Resources
Evaluating and Verifying Resources
When we access or retrieve something on the Internet
we need to be able to decide whether the information is useful, reliable,
or appropriate for our purposes.
Guidelines
| Who
is the author or institution? |
-
If the author is a person, does
the resource give biographical information?
-
If the author is an institution,
is there information provided about it?
-
Have you seen the author’s or
institution’s name cited in other sources or bibliographies?
-
The URL can give clues to the
authority of a source. A tilde ~ in the URL usually indicates that it is
a personal page rather than part of an institutional Web site.
|
| How
current is the information? |
-
Is there a date on the Web page
that indicates when the page was placed on the Web?
-
Is it clear when the page was
last updated?
-
Is some of the information obviously
out-of-date?
-
Does the page creator mention
how frequently the material is updated
|
| Who
is the audience? |
-
Is the Web page intended for
the general public, scholars, practitioners, children, etc.? Is this clearly
stated?
-
Does the Web page meet the needs
of its stated audience?
|
| Is
the content accurate and objective? |
-
Are there political, ideological,
cultural, religious, or institutional biases?
-
Is the content intended to be
a brief overview of the information or an in-depth analysis?
-
If the information is opinion
is this clearly stated?
-
If there is information copied
from other sources is this acknowledged? Are there footnotes if necessary?
|
| What
is the purpose of the information? |
-
Is the purpose of the information
to inform, explain, persuade, market a product, or advocate a cause?
-
Is the purpose clearly stated?
-
Does the resource fulfill the
stated purpose?
|
Tips
-
Look for the name of the author
or institution at the top or bottom of a Web page.
-
Go to the home page for the
site that hosts the information to find out about the organization.
-
To find further information
about the institution or author use a search engine to see what related
information is available on the Web.
-
Use Deja News, http://www.dejanews.com,
or another tool to search archives of Usenet articles to find other information
about the author or institution, and in the case of an individual to see
what sorts of articles they’ve posted on Usenet.
-
Check the top and bottom of
a Web page for the date the information was last modified or updated. If
no date is present look at the Document Info if you’re using Netscape or
the Properties if you’re using Microsoft Internet Explorer.
Some
techniques you can apply to help with evaluation:
Who is the author or institution?
-
If the author is a person, does
the resource give biographical information?
Look for the name of the author or institution at the top or bottom of
a Web page.
-
If the author is an institution,
is there information provided about it?
Go to the home page for the site that hosts the information to find out
about the organization. You do this by extracting the first part of the
URL - the part starting with http:// up to the first slash (/).
-
The URL can give clues to the
authority of a source. A tilde ~ in the URL sometimes indicates that it
is a personal page rather than part of an institutional Web site.
-
Make note of the domain section
of the URL, as follows:
| Domain |
Description |
| .edu |
educational
(anything from serious research to zany student pages) |
| .gov |
governmental
(usually dependable) |
| .com |
commercial
(may be trying to sell a product) |
| .net |
network
(may provide services to commercial or individual customers) |
| .org |
organization
(non-profit institutions; may be biased) |
-
Use search tools for Web pages
and Usenet postings (www.dejanews.com)
to learn more about the author/institution.
-
Use WHOIS Service at http://www.networksolutions.com/cgi-bin/whois/whois/
to determine the registrant of the Web site. Use the domain name - not
the URL. For example to check the page listed above "Teen Violence" http://www.worldahead.org/wam/9807/w9807f1.html
, use worldahead.org for the WHOIS search
How current is the information?
- Is there a date on the Web page
that indicates when the page was placed on the Web?
- Is it clear when the page was
last updated?
- If it's not clear from the Web
page the click on View in Netscape menu bar and select Page Info to see if
that tells when page was last updated.
- Is some of the information obviously
out-of-date?
- Does the page creator mention
how frequently the material is updated?
Some useful guides:
Selected Links
Return to previous page
This is a Webliminal
Production. Copyright 1999 Ernest
Ackermann
Please send comments/questions to ernie@paprika.mwc.edu
FROM the fortune list ...