Why do we need DSpace?
Libraries constitute the information gateways for society and the usage of digital assets has increased in libraries with the advent of digital objects, mobile devices and book readers. The digital materials (books, newspaper articles, audio and video files, etc.) must also be cataloged and preserved, as well as made available for users of the library. Digital archives are very useful for ensuring this. In addition, a repository gathers information, stores information with metadata, maintains it and provides enhanced visibility for online access.
In short, DSpace provides a standardized system within a library or an Institute to provide services for the dissemination of digital content to the users.
Having said that, however, we have to recognize that DSpace is not library management or cataloging system. It won’t help you with the digital assets check-in. Only digital properties are published on the internet. Because this makes stuff available on the Web, it should not be treated as a web-publishing site or a content management system. Whilst DSpace supports these programs, DSpace does not allow you to develop your institution’s website.
Let’s start installing now. DSpace is not out of the box and ready for the solution to be mounted. We’d need parts and tools from third parties to make things work. Installation and setup instructions for this third party program are very fast and easy. For full and up-to-date information, or further problem-solving purposes, refer to the documentation for every individual item.
The software needed by DSpace and why it is used is briefly explained below. Let us take a look at them
Pre-requisite Software:
- Java Development Kit.
- PostgreSQL
- Apache Tomcat
- Apache Maven
- Apache ANT
- DSpace Source Code
- Java Development Kit (JDK): JDK is a development environment for building applications, applets, and components using Java programming language. Java is also the language used to write the DSpace source code. You can download it from http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
The installation and troubleshooting instructions are available at http://docs.oracle.com/javase/8/docs/technotes/guides/install/install_overview.html
- PostgreSQL: PostgreSQL is a powerful, open-source object-relational database system. It has native programming interfaces for C/C++, Java, .Net, Perl, Python, Ruby, Tcl, ODBC, among others. We would use PostgreSQL to store the database of our repository. You can download it from http://www.postgresql.org/download/windows or http://www.enterprisedb.com/products-services-training/pgdownload#windows
The installation and troubleshooting instructions for PostgreSQL is available at http://www.postgresql.org/docs/9.4/interactive/index.html
- Apache Tomcat: Apache Tomcat is an open-source software implementation of the Java Servlets. This will work and help us to make our Web server to make the repository web interface available on the network (LAN or WAN). You can download it from https://tomcat.apache.org/download-80.cgi
Windows Installation, Configuration and Troubleshooting guidelines for Apache is available at http://tomcat.apache.org/tomcat-8.0-doc/index.html
- Apache Maven: Apache Maven is a software project management and comprehension tool. Based on the concept of a project object model (POM), Maven can manage a project’s build, reporting, and documentation from a central piece of information. For this installation, we would require Maven binary archive. You can get it from https://maven.apache.org/download.cgi
Configuration of Apache Maven package on windows is demonstrated at https://maven.apache.org/install.html
- Apache ANT: Ant is a Java-based build tool. In theory, it is kind of like Make, without Make’s wrinkles and with the full portability of pure Java code. You can download Apache Ant from https://ant.apache.org/bindownload.cgi
Configuration of Apache ANT package is available at http://ant.apache.org/manual/index.html
- DSpace: You know about it already; this will be everything you’ll need to set up a repository. Github provides two versions of DSpace code, but for windows installation, we require DSpace source release which it is available at. https://github.com/DSpace/DSpace/releases/tag/dspace-5.3
About the installation and configuration, of course, this document will help you with the basics, however, if there is an advanced topic which is not covered here, you can always find your way at duraspace wiki page https://wiki.duraspace.org/display/DSDOC5x/DSpace+5.x+Documentation
Further, the DSpace support community is very active on their mailing list (dspace-community@googlegroups.com and dspace-tech@googlegroups.com ) and IRC Channel (#dspace).
So with the above clarification of the Prerequisites for DSpace, we now move to install them.
So, BRACE YOURSELF, FOLKS!
Download all the above-mentioned software and save it in one folder.
Software versions I used:
JDK: jdk-8u60-windows-x64.exe
PostgreSQL : PostgreSQL-9.4.4-3-windows-x64.exe
Apache Tomcat: apache-tomcat-8.0.26.exe
Apache Maven: apache-maven-3.3.3-bin.zip
Apache ANT: apache-ant-1.9.6-bin.zip
DSpace: DSpace-5.3-src-release.zip
DSpace supports operating systems for both Linux and Windows, but it will focus on the operating systems of the Windows Family. The demonstration was seen on Windows 7, 64-bit career, and on the Windows Server 2008 and Windows Server 2012, the installation will be the same.
So here are the software tools and packages which I have downloaded from their respective websites as mentioned above.
Installation of JDK and JRE
To Install Java Development Kit and Java Runtime Environment, just double click on the downloaded .exe file and it will start with the installer window.
Check that all the features needed for the installation are active and the installation location is correct. Then click ‘Next’. After JDK finishes its installation, it will continue to install JRE.
Just keep all the settings on their default mode and click on the ‘Next’ option until it finishes its installation.
Installation of Apache Tomcat
After installing the Java Development kit and JRE, install Apache Tomcat which would server our webpages to the internet. Similar to the previous package, just double click on the tomcat .exe file to begin the installation screen.
Check all the components you want to install and click on ’Next’.
The next screen will prompt you the port numbers you want your apache server to run on and the username and password credentials to administrate the tomcat web server.
This will give you access to monitor and control your Tomcat server from a web user interface.
Check if the installation location is correct. You can change the folder if you want to install the Tomcat server at a location other than the C drive, but here we have kept it on default mode and thus it will get installed at C:\program Files
The installer will finish and will automatically start the Apache Tomcat as a Windows service.
If the installation is done correctly, you will see an apache feather with a green icon in the taskbar, which signifies that the apache service is in running state.
Installing PostgreSQL Database Server
Go to the folder where you have downloaded the packages, and double click on the PostgreSQL .exe file, the installer with begin with a welcome screen as shown below.
Provide the installation location. Here we are keeping it on default mode which is in C:\Program Files, but if you want to change the installation location you can do that as well.
Similarly, it will ask about changing the data directory but, let’s not mess up with that and keep everything on default. Click “Next”.
The next screen will ask you for the database access credentials, it will ask you to set a superuser password that will have access to the entire database engine and its contents. The default username is “Postgres”
Set the desired password and click on ‘Next’.
The database port can also be left on default mode which is 5432. also select the locale as default and click ’Next’.
The installation may take a few minutes. Once it is complete, it will show you ‘Finish’ with an option to start the stack builder, which is completely unnecessary for DSpace installation.
So, I recommend you to uncheck this button and click on “Finish”.
Ok, so now you have all your installed packages listed in all programs in windows start menu.
Other things are already in working order, lets open “pgAdmin III” to open the PostgreSQL administration panel.
You will be asked for the super-user password you assigned to when you install PostgreSQL every time you open the application. You can save the password if you want, but it is advised to switch only if you are protected and you are the only user of the server or device. After all, with quick access to your account through your saved password, you would like nobody else to break into it.
Creating a login role for Dspace
Now that you have logged into the PostgreSQL server with superuser access, and you have the global privileges, it’s time to create a login role for the DSpace database. A login role is nothing more than a dedicated user who will have the read-write access to its corresponding database only. Thinking from a computer science perspective, using the superuser login to perform every task, is a bad idea, so we have to do it the right way. Right-click on the “Login Roles” icon in the right sidebar, and select “New Login Role” from the context menu.
Another pop-up window will appear where you would have to define the name of the login role and any description if you want to give. I have created a new login role and have named it “DSpace” as shown below.
Click on the definition tab and provide the password for the new login role. For this installation we have created the login role “DSpace” and its password is also “DSpace”. This is the common practice in maximum DSpace installations but you can change it as per your choice. Don’t specify anything in the date and connection limit.
Next, switch to the “Role Privileges” tab and check all the checkboxes expect superuser. So it makes your user a very powerful one, but not as powerful as the superuser.
Click on the “Ok” option to create the login role. Once you are done, you can see that in the right sidebar tree and its properties on the left.
Creating a database for DSpace
After creating the login role now let’s get to creating the database itself. As you did while creating the login role, go to the right sidebar and right-click on the “Database” icon and select “New Database” from the context menu.
A pop-up window will appear with the options as shown in the figure below.
Name this database as “DSpace” and as the owner of the database, select “DSpace” login role from the drop-down menu. By doing this, you are just assigning a user to the database, although the superuser (Postgres) is having global privileges, and it can make changes to this database too, but if other databases are running on the same server, this mechanism comes in handy to isolate all other users and their working domain.
There are no more options that you need to configure in this section, just click ‘Ok’ and you will find a new database created on the right sidebar, and its properties will show in the left bottom section.
Installation of ANT and MAVEN
These binary packages contain no exe or msi installer, so the installation of these two methods is a little different but not difficult. Extract the zip files into your installation drive which in our case is C:\
Just to complete it quickly, I have extracted DSpace source files as well to save some time, since we need to do that later anyway. For extracting the zip files, you can use WinZip, 7zip or anything that suits you, I find WinRAR very friendly so I have extracted the files using that as you can see below.
Select the files and set the destination of extracted files as C:\
Now that you have extracted the zip files of Apache ANT and Apache Maven, you have done 50% of their installation work. But how does your computer know that the Apache ANT and Apache Maven binary files are located in C drive?
Well yes, it shows that computers are still not as intelligent as humans are. So it’s time for YOU to take control of your computer and tell it what to do and how to do it.
Setting up Environment variables.
For any operating device, device variables exist and in theory, they all mean the same. The installation route is stored on all programs and devices on your computer, from which your machine can find out what is installed, where it is installed and how to run it. When you install an exe file, the installer extracts the files to the system drive and transmits them into the system variables, but as there is no exe or installation file in the Apache ANT or the Apache Maven packages, we must do it the old way.
Right-click the screen icon for My Computer and choose Properties from the context menu. This opens a System Properties window, which allows you to open another pop-up windows by clicking on the’ Advanced Device Settings ‘ button on the Right side navigation pane. Click on the “Weather Variables” button, and another pop-up window will open as shown below.
This window consists of two parts, horizontally spread out in the top as the’ User Variables’ and in the bottom as the’ Process Variables’ Click’ New’ to build a new user variable called “JAVA HOME” in the User Variable section. Please notice that the variable value is the JDK installation directory.
If you have not changed the default values during the installation, you will find the JDK installed at C:\Program Files\Java\jdk1.8.0_60. If your installation version differs from me, the folder name of JDK may change slightly. So the best way is to go to Program files and open the JAVA folder, then open the folder named “jdk1.x.x_xx” whatever it may be in your case. It’s not as difficult as it sounds, just have a look at the following image and you’ll get to know.
Copy this path and paste it in the “Variable Value” field.
Click ’OK‘ to close it.
Now let’s define the path of the binary executable files in the “System Variable” section. Select the’Path‘ and click on the ’Edit‘ button. It will open a similar properties box.
DO NOT remove the existing continent form its value field, unless you want to screw up everything and ruin your system. This field contains the installation paths for the other software on your computer. Just move the cursor to the end of the line and put a semicolon (;) as value separator. Then paste the same installation path here, but this time we will include the “bin” subfolder.
So the actual path would be “C:\Program Files\Java\jdk1.8.0_60\bin” as shown in the image above.
Similarly, we will define user variables and system variables for Apache ANT and Apache Maven.
Copy the path of Apache ANT, where you have extracted it. Here it is “C:\apache-ant-1.9.6” (The version number may vary according to yours)
Create a new User Variable named “ANT_HOME” and paste the variable value that you just copied from the extracted folder. Click ‘Ok’.
These are the only two user variables that we need to create for this installation. The rest of the stuff is to deal with “Path” in system variables. Edit the Path in System Variables, copy the directory path of Apache ANT folder but this time includes the subfolder “Bin” as well. So your path would be C:\apache-ant-1.9.6\bin.
Place a value separator ( ; ) and paste the path value in the system variable
Similarly, go to the folder path of Apache Maven and copy the path including the “bin” folder as shown below.
Copy this path Then go to a system variable. Place a semicolon as a value separator and paste the value. According to my installation the variable value in this input box is “C:\Program Files\Java\jdk1.8.0_60\bin;C:\apache-ant-1.9.6\bin;C:\apache-maven-3.3.3\bin”
I’m saying this again & again, coz this is very important that you place a semicolon ( ; ) to separate all the three values, otherwise the computer will assume it as one single value and may lead to fatal errors while installation. Close every pop-up window by clicking ’Ok’ and this part is Done. We can cross-check with the installation, from the command prompt. To do that open the command prompt and type the following commands one by one.
C:\>java –Version.
If the installation is correct it should show you the version information in the command window as shown below.
Next, type the command
C:\>ant –version.
If the installation is correct then you will see the version information of Apache ANT as following;
At last check the Maven installation with the command input as
C:\>mvn –version.
If everything is fine it should show the following version information;
If you are getting the exact output, you have configured everything correctly. Don’t panic if it doesn’t, because sometimes computers are dumb enough not to recognize the changes instantly. You just need to restart the computer and re-run the test commands.
Compiling the DSpace source code
All right, you are done, working with third-party pre-requisite software. Please note that the sequence of installing this software is arbitrary. What I mean by that is that you could have installed the prerequisite software in any order, without making much of a difference. (However, I must add that doing it exactly in the order in which we did in this documentation would increase your success rate by 80%, but it’s totally up to you!)
Now we will begin with the compilation and installation of DSpace source code, so you have to follow each step precisely and this time in the order as described below.
Create a new folder named “DSpace” on the drive where you want to install it. Here I have installed it on C drive, which is the most common practice almost everywhere.
Now go to the DSpace Source folder which you have just extracted.
Navigate through the subfolders (Dspace Source Release>Dspace>config/) until you reach the “DSpace.cfg” file as. Open the file with any notepad editor and find the string “dspace.dir = ${dspace.install.dir}, which would be on the line number 27 in the file.
Change this line with “C:/DSpace” as shown below.
(Notice the UNIX style forward slash in the URI.Do Not copy the path from the explorer window, just type in precisely)
This file contains all other configuration settings of your DSpace repository, i.e., name of Repository, URL, Database Settings, SMTP and Email settings. For now, we are only changing the required value. You can read through this self-explanatory file and explore other options. But I would recommend that you keep a backup copy of this file or every file before you edit them so that if anything goes wrong, we always find our way back to revert the changes.
Running mvn package
So, with all this done, now comes the Geeky part. The Apache Maven package that we just set up is a necessary tool for the first stage of the build process to assemble the installation package of DSpace. Open the command prompt of your computer and navigate to the DSpace source directory by giving the following commands one by one in the same command window.
C:\> cd dspace-5.3-src-release\dspace
C:\dspace-5.3-src-release> cd dspace
C:\dspace-5.3-src-release\dspace> mvn package
This command will start the compilation of the DSpace source code. It would also download some of the scripts and packages from the maven repository, so ensure that you have a stable internet connection while this happens. You may also think of getting in touch with your network administrator if you don’t have a direct connection because there is a high probability that you will end up getting annoying errors and build fail messages if your installation server is behind a proxy or a firewall. There is a method described on DSpace manual about how to get rid of this problem, which can be found here: https://wiki.duraspace.org/display/DSPACE/Set+Maven+Web+Proxy+Server+Settings
If everything is all right and you have a stable internet connection, this command window will start downloading and compiling the DSpace source code. This may take a while, so you can stretch out and have some coffee while the CLI does its magic with Maven in doing most of the work for you. But don’t interrupt it in between. Sit back and relax. Reading the DSpace manual in this free time would also be a great idea.
As you can see in the image above, it took 34 minutes and 48 seconds to complete the build process. It may vary, depending on your internet connection speed. Patience is the secret fellows, and if you have that patience, you will be rewarded with the “BUILD SUCCESS” message.
With that, the first phase is complete. What all this lengthy practice has done is that it has downloaded some add-ons and modules form maven repository for DSpace, and have put those in new subfolder e.g. “target/dspace-installer” in your DSpace source directory.
Running ant fresh_install
We will now go to this directory by typing in the following commands, one after the other in the same command window.
C:\dspace-5.3-src-release\dspace> cd target
C:\dspace-5.3-src-release\dspace\target> cd dspace-installer
C:\dspace-5.3-src-release\dspace\target\ dspace-installer>ant fresh_install
Take a look at the image below for the reference.
Before moving further just make sure that your PostgreSQL service is running, you can reload its configuration from the ’Start’ menu just to be double sure.
This ant fresh_install command will go through the “dspace.cfg” to read all your database settings, E-mail settings, and file location settings and it will copy and make JSP, WAR files and web applications in your newly created “DSpace” folder in the C drive. It will also create necessary tables in the DSpace database which you created in PostgreSQL.
Just a quick troubleshooting tip: If this process does not work, verify the DSpace database username and password in PostgreSQL and check the configuration in dspace.cfg file. Most of the time, the errors are caused by the misconfiguration of the dspace.cfg file or database credentials. But if everything comes out clean, you will get a “Build Successful” message as shown below.
Now DSpace is installed and note that it took much less time than the mvn package. As I mentioned earlier, the ant fresh_install has built new. JSP and WAR files for the webapps of the DSpace web interface in the DSpace folder. But we must remember that the Apache Tomcat Server does not know anything about it yet. So you either have to configure your apache server to directly pick up the webapps from the DSspace\Webapps folder, or we can simply copy the contents of the C:\DSpace\webapps folder to the apache webapps folder.
I chose to go with the second option, and copy the contents of the DSpace webapps folder into apache webapps, as shown in the following image.
Just copy the contents and paste them into
C:/Program Files/Apache Software Foundation/Tomcat8.0/webapps
Restart you tomcat server and open any of this URL in your preferred web browser
I chose XML interface, and it gave me a nice clean interface of my repository home page. In this fresh installation, there is no community or collection created. You would have to login as the administrator of the repository and start adding your contents.
But wait a minute. How are we supposed to log in? We don’t have any username and password for administrating it, and it never prompted for it during the installation phase!
No worries, we will create one now, but before that, let us check the DSpace version and other brief information about this installation in CLI. Go to C:\Dspace\Bin> directory
For doing this check, use the command prompts and issue the following command
C:\Dspace\Bin>dspace version
This should show you all the information related to your DSpace installation and its version.
Creating DSpace Administrator
Now with all in place, let’s create administrator user of our repository.In the same command window issue the following command.
C:\DSpace\Bin> DSpace create-administrator
It will ask you about your First name, Last Name, Email ID and desired Password, fill in the appropriate data, and confirm by Entering ’Y’
When you press ‘enter’, your admin account will be created which you can now use to login into the DSpace repository through its web interface as shown below.
DSpace Content Hierarchy
After installing the DSpace repository Software package, we can start uploading the contents onto it, but before we do that, we need to understand the DSpace hierarchy. If we were to look at DSpace as a model of block diagrams it would look like this.
- Community: Community is described as the highest level of DSpace content hierarchy. They can be University Departments, Research Centres, etc. A community can have its separate metadata. And a community can hold a sub-community or a collection within it.
- Collection: A collection is what contains the items. In the case of a University Department, the collection would be the class section, semesters, subjects, etc. A collection has its metadata, submission, and access policies.
- Item: An Item is comprised of both metadata and the actual file uploaded on the repository. The actual file is called ‘bitstream’. An item can hold a one-bit stream or several bitstreams which are then called ‘bundles’.
- Bitstream: A Bitstream is the actual piece of digital content. It can be a text file, an image file or even an audiovisual file.
Uploading contents
With this information in mind, let’s start creating our first Community in DSpace, which is on top of its hierarchy.
Click on the “Create Community” link on the right sidebar.
This webpage will have many metadata fields to describe your community at its best. Fill in the relevant information as I have done for this demonstration purpose.
Click on ‘create’ and your very first community will be created and it will be shown on the home page of the repository. I have created one named “Open Source”
As we have already noted, Communities are further divided into ’Collection‘ which is second on the hierarchy. Click on ‘create collection’ link in the right sidebar to create a new collection in this community.
Fill in the relevant metadata that best describes your collection and click on ‘create’. A collection gives you much flexibility in configuring its submission and access rights as shown in the figure below.
Configure it as per your needs and continue.
Further, we may note that we can make a simple collection where we will be submitting/uploading the data items for our users. But what if, some other repository already has that same data available to them online, and we just want our repository simply linked to their data, so that user can search within our repository interface for the data and for the shown results the user will be navigated to the external source we have linked? Well, we all know that DSpace supports OAI-PMH, which is a protocol for sharing metadata between two standardized systems, and thus it allows people to harvest an entire repository. This is the most powerful, but often the most overlooked feature of DSpace. Remember DSpace can also act as an OAI-PMH client.
In the image above I have shown the tab of contents source, where you can define the address of external OAI-PMH server from where you want your DSpace to fetch the contents/metadata. This is an advanced topic but just for your information, I am adding that bit on.
Once you finish creating a new collection, it will show you a link of “Submit a new item to this collection”. Clicking this will take you to a new window, where you have to fill in the metadata of the item or the file you want to upload to this collection.
This is going to be a multi-step form, where you first have to provide name, author, language, and other information about the file you are going to submit, as is shown in the image above.
In the next screen, you have to define the keywords for the item, abstract and description of the file.
The next step will browse the files from your computer and upload it onto the DSpace server along with the metadata you have entered during the previous steps. You can provide a small description of the actual file as well.
The next screen will just show you a preview of the information you have entered all the way along. If you see anything that you would like to change, you can do so here. If everything looks good, just continue to the next screen.
This is the default license screen, which DSpace will show you and will ask the submitter to grant the clauses, terms, and conditions laid down in the license and continue uploading the file. You can also have your customized license for each collection, but that needs to be done while creating a new “Collection” itself. Check on “I Grant the Licence” and complete the submission. You will see a nice screen showing the completion successful message and your item/file will be available for web access.
The item preview screen shows the brief information about the item uploaded, but you can see the full item record as well as the entire metadata in the Dublin Core format. As a normal repository user, you can just click on the “View/Open” link to download the associated file.
You are now able to make your DSpace work and so all these are buddies from me here. I will seek to cover some more technical things in my next documentation such as backup (Database Backups, AIP Backups), the configuration of the domain name, e-mail, updates for versions, adding metadata, and customization of the user interfaces. Yet you can now use this paper as a guide to starting installing DSpace yourself. Don’t be discouraged if you’re not effective at first. You only get what you are seeking and telling, “Practice makes a man good” (and women too). You will finally get to learn something new every time you try to make some mistakes. Go through the official DSpace documentation for more details, and keep in touch with the DSpace community on IRC and mailing lists. You will find people wiser than me there. But you can always write to me at my e-mail address whenever you want.
So these are the steps in Dspace installation, and I hope you deploy this as an institutional repository at your office.