Home » RDBMS Server » Server Administration » Getting data for loading into a UTF8 Oracle instance (Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production )
Getting data for loading into a UTF8 Oracle instance [message #614079] Thu, 15 May 2014 22:19 Go to next message
orauser001
Messages: 13
Registered: April 2013
Location: us
Junior Member
BACKGROUND

We are building a datawarehose on oracle

Version : Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
Characterset :
"NLS_CHARACTERSET","AL32UTF8"
"NLS_NCHAR_CHARACTERSET","AL16UTF16"

We are required to store multilingaul data in our datawarehose. From a Storage perspective
since we have a UFT8 characterset we should be good.

Question I have is related to getting data from our data sources into the warehouse. Some
facts about our environment:

We will be recieveing data from 100 or so systems using files

These systems have varying DB/OS/Characterset environments
- some are Windopws others Unix
- Some are SQL Server, others Oracle
- Even the ones on oracle may have different character sets


The data from source system need to be provided using files that are sftp'ed to our
servers

OBJECTIVE

These source systems from which we are sourcing data have varying level of support for
multilingal data

Since out datawarehouse is UTF-8 which has all possible characters, we are trying to
find a solution where we can get data from various sources into the datawarehouse
along with the foreign characters coming through correctly (no garbling foreign
characters -upside down ?'s etc)

The data from source system need to be provided using files that are sftp'ed to our servers


QUESTION

How do i approach this problem.

Can systems using different DB/OS/Characterset environments - export data in a standard
unicode format that converts the foreign characters encoded in their charactersets to a standard
unicode encoding - so that it can be loaded in our datawarehouse directly.
If so What is this file format called ? UNICODE ? UTF-8 ?

Or will the files exported from different source systems be encoded as per the DB/OS/
Characterset of their environment.If so how can i use them to load data in our datawarehouse
so that we do not garble foreign characters (upside down ?'s etc).


Thanks in advance!
Re: Getting data for loading into a UTF8 Oracle instance [message #614100 is a reply to message #614079] Fri, 16 May 2014 06:16 Go to previous message
ThomasG
Messages: 3211
Registered: April 2005
Location: Heilbronn, Germany
Senior Member
You can of course load files with different character sets into the database.

* If you use external tables you have to set the "CHARACTERSET" parameter in the external table definition accordingly.
* If you use SQLLoader or another "client-program" to load the data you have to set the NLS_LANG environment accordingly.

For a "standard format" I would suggest UTF-8.

The main challenge I see is identifying the file formats. And when you already have to do that, you could also "convert" all files that are not UTF-8 to that encoding before loading them. For that you could perhaps use a combination of file and iconv

Previous Topic: Excessive DB connections.
Next Topic: Oracle 11g r2 ( Amazon RDS instance )
Goto Forum:
  


Current Time: Thu Mar 28 09:41:55 CDT 2024