Asciidoctor-latex patch for html backend offline usage

After you have installed the asciidoctor-latex package from github following the instructions in the README.md you can generate pretty cool html pages with mathematical formulas. All the advantages of asciidoc can be combined with latex features to generate html(5) webpages.

https://github.com/asciidoctor/asciidoctor-latex

In the newest version of asciidoctor-latex at the time of writing this blog (29/01/2016) the generated html files contain a couple of links/references to internet resources:

  1. a css style sheet to load google webfonts
  2. the javascript library jquery
  3. the javascript library MathJax

Due to those dependencies the html files / documents generated with asciidoctor-latex can be used on webservers for use in situations when you have internet connection, but the document will not render the mathematical formulas correctly, if the internet connection is missing.

To get rid of this restriction the named resources can be downloaded from their internet locations and stored locally in corresponding subdirectories. If you don’t want to do this by yourown, I bundled all resources in a zip archive for you.

[download internet resources]

To make the generated html files usable offline, unzip the downloaded file in the same directory where you find the html file.

Now you have to make some changes to your asciidoc-latex installation, so that the generated html files contain links to the local resources instead of the internet links in the original distribution.

 

Inverse DNS resolution with HBase and Phoenix

Recently I did some quick experiments with HBase (1.1.2) and Phoenix (4.6.0). As a quite big dataset I used the DNS data available as a dnscensus 2013 archive. This dataset contains DNS data collected by extraction of port 53 data out of network captures at some central internet routers. Here some examples:

To get the inverse map of all collected type A DNS requests the phoenix table should have a key starting with the ip address! Don’t confuse reverse dns lookup (PTR) with the inverse of a type A DNS request!

The table create script for phoenix looks like

To get the original data in the right order I used sed

A bit tricky was the line endings of the original file data, why I used .* to rid of all type of white spaces after the ip addresses. The tail command just removes the header line, which is not needed when the data get imported into phoenix.

How many rows do we have?

Loading the csv data into phoenix is quite simple with the bulk load utility:

Phoenix-Squirrel-dnscensus

The result is a list of dns names which resolve to the ip of google’s public dns server 8.8.8.8!

The select from a table with almost 1 billion of rows took less than a 10th of a second! Nice 🙂

HTTP2 tests with Apache2 and docker

1. General remarks

http/2 is the next generation protocol for the web. Initially http/2 was developed by google with the name SPDY [2]. The protocol was introduced because the actual / former protocols http/1.1 and http/1.0 allowed just one request per connection, so that a webpage with a lot of images etc. had to load them sequentially. http/2 allows to multiplex multiple transfers/requests via a single connection. Further drawbacks of the http/1* protocols have to do with the lower level tcp protocol. http/2 also uses udp to get rid of some of those disadvantages.

2. Software

Curl [3] supports http/2 since version 7.43.0 and is very well suited to test your http/2 installations with the special option --http2.

Apache2 supports http/2 with a special module [4] since version 2.4.17. This Apache2 version is available in debian sid already as a package from repository.

Firefox supports http/2 completly [5] since version 36.

Chrome [6] supports http/2 of course 🙂

Internet Explorer 11 does NOT support http/2.

If your Browser supports http/2 can be checked online using a service of akamai [1].

3. Installation steps on a debian 8 system

I have a running debian 8 (jessie) box and I wanted to try the new apache2 module for http2. Because I didn’t wanted to destroy someting on this box, I decided to test this using a docker [7] container.

Of course I had to install docker first. One important remark here: There is another package called docker which has something to to with docking windows on the desktop. And this is the WRONG docker package which we do not need here. Do get docker installed, you have to create or modify an apt conf file:

Before executing

the https transport protocol package for apt has to be installed (if not already installed).

Now you can run apt-get update to pull the available packages from the docker repository. When you now search for docker you get the following or a similar result:

The package that is needed for our experiments is docker-engine. The package is simply installed with:

If everything worked like expected you should see the socker service running:

Your first simple docker command can be the check of running/available docker containers:

Of course this devlivers an empty list. We will change this by starting a debian sid container with the following command:

Our first step in the brand new sid container is a apt update:

Now we can install apache2 (2.4.17)

sid-apache2-installation

We check the installed apache2 version:

Ok great. This is the required version which already supports http/2. Let’s proceed with the installation of curl:

We see that this is a curl version which supports already the http/2 protocol.

And we see that the –http2 option is also available.

Let’s install some network tools so that we can check our system from the network point of view:

Now we have to enable the http2 module in the apache configs:

Now add the following lines to the virtual host config files. Be aware that the TLS config has protocol h2 and NOT h2c!!

Add/change the marked lines in default-ssl.conf. If you want you can also replace the snake oil certificate/key by a real or self signed certificate!

apache2-ssl-vhost

And correspondingly for the non ssl vhost:

apache2-http-vhost

Additionally we have to change some settings in the general ssl config:

  1. disable SSLSessionCache
  2. set the SSLCipherSuite like shown
  3. set SSLProtocol like shown.

Now let’s start the apache2 server:

So this looks good up to now. The apache2 server is up and running and is listing on http port (80) and https port (443).

4. Testing the http/2 abilities

4.1. In container curl

Our first test is possible directly in the container. To check the http2 abilities we use the previously installed curl package:

4.2. From outside of the container with real browsers

On our host system we can check the running docker containers again:

We can check if the mapped ports (1443 and 1080) are available as LISTEN on the host system.

So we can for example access to those ports from your client to the host (the firewall settings have to be adapted eventually).

See here the access log after accessing the webserver with firefox and chrome.

http2-access-log

 

5. Links:

[1] https://http2.akamai.com/demo

[2] http://dev.chromium.org/spdy/

[3] http://curl.haxx.se/docs/http2.html

[4] https://httpd.apache.org/docs/2.4/de/mod/mod_http2.html

[5] http://www.golem.de/news/mozilla-firefox-36-kann-http-2-1502-112509.html

[6] https://www.google.de/chrome/browser/desktop/

[7] https://www.docker.com/

Hadoop 2.6.0 API Javadoc (with private classes/methods)

If you ever wrote some MapReduce program using SequenceFile you may have asked yourself what methods in detail are available in the class created by the methods createWriter( ). The standard javadoc from apache’s website doesn’t show this information:

Hadoop-2.6.0-api-apache-standard

This is why I regenerated the javadoc from the sources with private classes / methods enabled.

Hadoop-2.6.0-api-woopi

You can view the javadocs online or you can download the tarball:

Hadoop 2.6.0 javadoc with private classes/methods (view online)

Hadoop 2.6.0 javadoc with private classes/methods (tarball)

Android API standard javadoc

Android source

On the website android.googlesource.com are all android platform sources available via git repositories or tarballs.

As an example if you want to get the source for the version 4.1.2

Android platform source v 4.1.2

To grad all the files in a tarball you can use

Android platform source v 4.1.2 tgz

Generation of standard javadocs

On windows I got some errors when trying to untar the tarball. Using linux (debian in my case) is a better choice.

After untar of the tgz change to the directory and enter a command like

Maybe the path to javadoc has to be adapted to your distribution.

Android platform standard javadoc v4.1.2 [tarball download]

Android platform standard javadoc v4.1.2 [browse online]

XMing XLaunch tcp listener

During one of the last updates of XMing there was a small but important change which cost me some hours until I found the solution.

 

The listener for Display 0 which is usually running on port 6000 was not starting with XLaunch. After some investigation and searches in forums I found out that now you have to enter

in the field for additional parameters!

XLaunch-tcp

Securing your Joomla Site


1. Motivation

Nowadays it is not enough to just install a software like Joomla on your server. Why? If you have your site running for a while and start to look in your logfiles, you will most propably find traces of attempts to break into your system. The attackers are creative peoples and the find every day new possibilities how to get around the actual defence mechanisms. I don’t understand the exact motivation of hackers, but it seems that they have always fun to attack systems in internet.

The first two take aways are:

  • close all not needed ports on your system using a firewall. I like to use iptables wrapped with shorewall.
  • enable logging of the services that you decided to keep reachable from the internet.

2. SSH Service

The ssh service is the front door to your system. In the default configuration you login to the system with a ssh client using user and password. If you keep the configuration like this, you will most likely see brute force / dictionnary attacks to the ssh service in the log file

  /var/log/auth.log

Using tools like fail2ban [1] may help a bit by just closing the firewall for the requesting IP addresses of the attackers. Some attackers will stop attacking your server, but some just switch to rotate the used IP addresses and again the fail2ban mechanism is not very useful.

In many cases it is much more secure to disable user password authentication completely and instead use public private key authentication. To do this you have to edit

  /etc/ssh/sshd_config

and set the following to parameters to no!

ChallengeResponseAuthentication no
PasswordAuthentication no

Before you close the session or restart the ssh server you have to edit / create a file called

  <USER_HOME>/.ssh/authorized_keys

In this file you have to enter all public keys that should be able to the system as the corresponding user. Please read the FAQ/MAN pages for further details.

Caution

If you are not sure try this procedure first on a local system. If you make some mistakes, you may not be able to login to your system again!

3. Apache and Joomla

Now we take look to the Joomla installation itself. Most attacks are targeted to the administrator area of Joomla, where you can change content and manage the complete site from the Joomla point of view. The standard login is again user password authentication. So here the same attacks as already described in the last section can be expected — the difference is just the used protocol (instead of ssh here http is used).

Typically you can detect such attacks, when monitoring the apache log files. In case of attacks you will see many POST requests to the administrator area with NON 200 http response codes. Some tools try to switch the login path to another url with a code inside. This method is know as security through obscurity. Just google this and you will see the drawbacks of such a method.

One way to make the adminstrator area more secure is to configure a ssl secured virtual host in apache which can be used alternatively to access the Joomla site (frontend and backend). In the apache2 configuration of the http site you can add the following directory setting

 <Directory /<pathtoyourJoomlaSite>/administrator/>
                AllowOverride None
                Order deny,allow
                deny from all
                allow from 127.0.0.1 localhost
                DirectoryIndex index.php
                IndexIgnore *
                Options All -Indexes FollowSymLinks MultiViews
        </Directory>

so that no access is possible apart from localhost (which can be tunneled via ssh in urgend cases).

For the https/SSL configuration you add (via .htaccess in directory administrator/ or ssl vhost config file).

## Client Verification
SSLVerifyClient require
SSLVerifyDepth 3

# error handling
RewriteEngine        on
RewriteCond     %{SSL:SSL_CLIENT_VERIFY} !=SUCCESS
RewriteRule     .? - [F]
ErrorDocument 403 "You need a client side certificate to access this site"

SSLOptions           +FakeBasicAuth
SSLRequireSSL
AuthName             "Admin Only Area"
AuthType             Basic
AuthUserFile         /<somesecureplace>/.htpasswd
require              valid-user

The file .htpasswd contains informations of all clients that should be able to login to the administrator area. Each row has to contain the /CN of the clients certificate followed by the fixed string xxj31ZMTZzkVA separated by :

Here is a example

/CN=Donald Duck/emailAddress=donald@disney.com:xxj31ZMTZzkVA

To be able to access to the administrator area of Joomla you now have to authenticate to the server with a client ssl certificate (for example for free from cacert.org). The client certificate has to be installed in your browsers certificate store.

This way brute force attacks are much harder and brute force attackers will most likely switch to other less secured web sites.


4. References

[1]

fail2ban http://www.fail2ban.org/wiki/index.php/Main_Page

DOTNET development – automatic builds and tests

1. General remarks

Most of the successfull open source projects are using some elementary tools which have influence to the way the developer is working. If several team members have a common goal

common_goal

it is important that the project resources are managed in a save and easy way. Usually all relevant files should be checked in to a revision control system like Mercurial, Git, Subversion, CVS etc. This implies that everything that is needed to build the project is also under revision control.

To be able to handle merge conflicts easily, tools like Winmerge are a great help. BUT tools like Winmerge are only effective if the files they are comparing are text files with a structure that show differences line by line when comparing different versions.

Build tools like NANT have xml like files to control all the build processes, why NANT fits very good in this concept.

To allow new developers to start developing without fear to produce some hidden errors in the project, unit tests (thank you Kent Beck) give the necessary security. If after a change all tests still pass everything should be fine. This implies on the other hand that all code parts are tested in some test cases.

Log4Net Logging framework is well suited to add logging command to the code with the possibility to configure the output with respect to log levels per class.

Let’s put all the parts together ….

2. Requirements / Preconditions

Most important: Get a cup of coffee first! cup_of_coffee

  • DOTNET Framework
  • NAnt
  • Nunit
  • NUnit2Report
  • MySQL DOTNET Connector
  • Log4Net

3. The complete working example

As a start point for your own project you may download a complete example project with a simple unit test case which tests a simple count of a mysql database table.

nunit_example

To compile, build and test the project with test report generation afterwards use the command from the commandline in the project directory:

The generated unit test report is realized using frames and allows flexibel browsing in the test results.

testreport

Other NAnt targets defined in the example project are build, test, run.

4. Links / References

[1] Microsoft DOTNET Framework http://goo.gl/IRqzJg
[2] NAnt http://goo.gl/OLoihi
[3] Log4net http://goo.gl/PjtYU9
[3] NUnit http://goo.gl/mLzNmD
[5] NDoc http://goo.gl/RUc5Pq
[3] NUnit2Report http://goo.gl/QIEtSn

How to get SAP data into the hadoop data lake

It is possible to get a lot of informations using SAP standard remote enabled function modules, but there are some issues that can make the data retieval process painful. If you have a closer look to RFC_READ_TABLE function module for example, you will find out that even some simply structured standard tables like EDIDC (IDOC control records) cannot be read with all columns at once. Further if the given delimiter appears in the flat file structured response (CSV like) you get some trouble to parse the results. If you get a little bit deeper into real life use cases you recognize that where conditions, table joins etc. are needed. If you want to retrieve a lot of datasets from a table the rowcount/skip rows mechanism is not very performant and even worse can lead to inconsistent results if changes are made between two requests.

Because of those issues we started to develop some function modules and classes which remove some of the described pain points. Starting development with an improved version of RFC_READ_TABLE we quickly found other needs and added further function modules that offer great features to retrieve interesting information from SAP systems (not only ERP).

Less visible for the user, but even more recognizable features when using with real life payloads are performance related optimizations. For example there are possibilities to get rid of the additional data amount to be transfered because of the SOAP encoding. MTOM is not (easy?) to be used in SAP services [maybe if you use additional systems like XI which has also a Java Stack 🙂 ]. Another internal improvment is the XML serialization implementation, which serializes even data with hex 0 characters. This is important, because the XML serialization available in SAP standard called asXML transformation produces dumps if such hex 0 charaters appear in the datasets to be serialized.

All developed improvements on ABAP side as well as some useful Java classes which help you to get the data into hadoop are available in what we called Woopi SAP Adapter. Woopi is a registered Trademark of Dr. Menzel.

1. Reasons to bring data into hadoop

  1. Hadoop storage is cheap, typical enterprise SAN storage is expensive (Cost)
  2. Hadoop storage is easy to extend without relevant upper limit, without system or structural changes. You just add more nodes and rebalance the cluster. (Volume)
  3. Processing of data can be 1000 times faster than with tradional systems (Velocity)
  4. You can process unstructured or semi-structured data too. (Variety)
  5. Calculations that you never thought about become possible now (new possibilities)
  6. Don’t waste capacities of your productive system by letting hadoop do all calculations which need not to be done in SAP. (Cost + performance)

If we have a look to the google trend chart for Hadoop, we see that it is getting continously more interest

hadoop_google_trend

WoopiSapAdapter_logo

2. The Woopi SAP Adapter Modules

2.1. RFC function module to read table data

Our function module Z_WP_RFC_READ_TABLE has to following features:

  1. multiple joined SAP tables
  2. special columns or ALL columns per table or ALL columns globally
  3. XML serialization of the results
  4. zip compression option
  5. where conditions
  6. number of results limitation
  7. meta data export from data dictionary
  8. reading data from cluster tables
  9. asynchronous mode to export huge amount of datasets per query.

2.2. RFC function module to read ABAP source code

If you extract e.g. daily the changed ABAP sources you are able to create a ABAP source repository with history. In this way you can check which source code was active at every timestamp you want to examine. This can be very useful, if you have to analyze errors or in cases of partly finished transport orders. With this data basis you can further detect code inconsistencies due to transport order  that came in the productiv system in wrong order.

The software just delivers such source codes that are changed since your last data retrieval run. In this way the source code extraction is not very time consuming and you can repeat it quite often.

 

2.3. RFC function module to read JOBLOGs, JOBSPOOLs, JOBINFOs

Job informations can be listet using TCODE SM37. In productive systems you have typically a lot of jobs in this list every day. Often you have to check the Job Logs or Job Step Spools for errors to guarantee no disruption or errors in your business processes. The standard function modules which can read Job Log and Job Step Spool informations

  • RSPO_RETURN_ABAP_SPOOLJOB
  • BP_JOBLOG_READ

are both not remote enabled. Our SAP adapter module for Job Logs delivers all Job Informations since the last data retrieval at once based on informations in SAP tables

  • TBTCO
  • TBTC_SPOOLID

2.4. RFC function module to read BDOCs as XML

Bdocs are the XML documents that are exchanged between ERP and CRM systems to keep their business content sychronous. With the SAP adapter module for BDOCs it is quite easy to continously pipe the BDOC messages as XML documents into a hadoop sequencefile.

Because all information about your business content is stored in one of the BDOCS you can use Hadoop to parse a huge amount of BDOCs in a short time. In this way you can do some analysis to answer business questions that came up a long time after the data were exchanged between the systems. Imagine e.g. the question about deleted sales orders. Without custom changes in the SAP system it is very hard to get this information out of the SAP system. You can also answer questions like in which system some changes have been made.

2.5. RFC function module to read IDOCs as XML

IDOCs are important, because these documents are used as one very often used way to exchange data between SAP systems or between a SAP system and external systems. Most imporant use cases are sales order import and warehouse transport order exchanges from and to external warehouse systems.

The arguments to save IDOC copies in hadoop are similar as for BDOCs even if the business questions are different.

2.6. ADK archiving with a copy into Hadoop

With minial changes you can reuse your archiving reports to write a additional copy of all data into hadoop.

All necessary code is available in the class Z_WP_ARCHIVING_HDP which has methods that are equal in name and parameters to the ADK function modules. So your migration steps are:

  1. Create a instance of Z_WP_ARCHIVING_HDP in the beginning of your write report
  2. Replace als ADK function module calls by the corresponding method calls of the already created Wrapper class instance.

 

The following ADK function modules need to be replace by the corresponding methods:

  1. ARCHIVE_OPEN_FOR_WRITE
  2. ARCHIVE_NEW_OBJECT
  3. ARCHIVE_PUT_RECORD
  4. ARCHIVE_CLOSE_FILE
  5. ARCHIVE_SAVE_OBJECT

 

2.7. XI Java Stack Database Table Reader

The SAP Java Stacks have their own database, which is not directly accessable from the ABAP Stack. Most of XI adapters are implemented in Java and are executed in the Java Stack. The communication with other SAP systems takes place in the ABAP Stack, so that messages have to be exchanged between the two stacks internally. The exchanged messages are stored in the database – some of them in the java database others in the abap database. You can monitor the messages using the runtime workbench. In case of (communication) problems between the two stacks you have to check the messages in both stacks. If messages are missing/lost in one of the stacks it is not easy to find those messages.

Because of this situations we developed a EAR for the Java Stack Application Server which offers possibility to access the java database generically (similar to the abap read table function module) over http.

2.7.1. XI Java Stack Message Reader

As a special use case we can get complete XI messages (meta data and payload) directly from the java stack database and write them as usual in sequencefiles into hadoop.

3. Woopi SAP Adapter on the Java / hadoop side

On the Java side the Woopi Sap Adapter has the necessary client classes to pull the data from the SAP systems. The actual state is persisted locally, so that the software knows which data to fetch next time.

There are MapReduce jobs which transform data retrieved from SAP tables and automatically generate (if necessary) HIVE or Phoenix tables in Hadoop and afterwords import the data to those hadoop databases.

If you write MapReduce jobs that are not just at HelloWorld level, you quickly get the need to handle serialized data (Avro, Json, etc.) using special comparators. Creating grouping comparators to get data from different input sources with corresponding tags in their key together, can be very time consuming and error prone. We have created some helper classes that makes it possible to generate comparators by just defining the necessary fields in the (extended) avro schema.

4. Use Cases

4.1. Example use cases

  1. statistical monitoring
    warnings when measured actual data deviates more than statistical standard deviation
  2. long term KPI calculations
  3. end to end (multi system) busisness process monitoring, e.g. sales orders external system -> sales orders ESB system -> sales orders CRM -> sales orders ERP -> transport orders (warehouse) -> delivery confirmation (UPS, DHL, Transoflex etc.)
  4. customer segmentation
  5. order recommendations
  6. warehouse: stock-keeping optimizations
  7. searches / analysis over CDHDR/CDPOS

4.2. Let us know about your use cases

Many other use cases are imaginable. If you have interesting use cases from your business, we would appreciate to get them. Just send us an e-mail.

publicdns.woopi.org

Some days ago I have configured a public bind9 DNS Server with a IPv4 and a IPv6 address.

You can get the IP addresses by

dig publicdns.woopi.org A
=> 5.45.100.40

dig publicdns.woopi.org AAAA
=> 2a03:4000:6:5::1

Feel free to use the server but there are no warranties from our sides (you don’t need to pay for the usage)

#DNS